Mary Inaba | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mary Inaba is active.

Explore More

Publication

Featured researches published by Mary Inaba.

conference on high performance computing (supercomputing) | 2007

GRAPE-DR: 2-Pflops massively-parallel computer with 512-core, 512-Gflops processor chips for scientific computing

Junichiro Makino; Kei Hiraki; Mary Inaba

We describe the GRAPE-DR (Greatly Reduced Array of Processor Elements with Data Reduction) system, which will consist of 4096 processor chips each with 512 cores operating at the clock frequency of 500 MHz. The peak speed of a processor chip is 512Gflops (single precision) or 256 Gflops (double precision). The GRAPE-DR chip works as an attached processor to standard PCs. Currently, a PCI-X board with single GRAPE-DR chip is in operation. We are developing a 4-chip board with PCI-Express interface, which will have the peak performance of 1 Tflops. The final system will be a cluster of 512 PCs each with two GRAPE-DR boards. We plan to complete the final system by early 2009. The application area of GRAPE-DR covers particle-based simulations such as astrophysical many-body simulations and molecular-dynamics simulations, quantum chemistry calculations, various applications which requires dense matrix operations, and many other compute-intensive applications.

ieee international conference on high performance computing data and analytics | 2008

Performance optimization of TCP/IP over 10 gigabit ethernet by precise instrumentation

Takeshi Yoshino; Yutaka Sugawara; Katsushi Inagami; Junji Tamatsukuri; Mary Inaba; Kei Hiraki

End-to-end communications on 10 Gigabit Ethernet (10 GbE) WAN became popular. However, there are difficulties that need to be solved before utilizing Long Fat-pipe Networks (LFNs) by using TCP. We observed that the followings caused performance depression: short-term bursty data transfer, mismatch between TCP and hardware support, and excess CPU load. In this research, we have established systematic methodologies to optimize TCP on LFNs. In order to pinpoint causes of performance depression, we analyzed real networks precisely by using our hardware-based wire-rate analyzer with 100-ns time-resolution. We took the following actions on the basis of the observations: (1) utilizing hardware-based pacing to avoid unnecessary packet losses due to collisions at bottlenecks, (2) modifying TCP to adapt packet coalescing mechanism, (3) modifying programs to reduce memory copies. We have achieved a constant through-put of 9.08 Gbps on a 500 ms RTT network for 5 h. Our approach has overcome the difficulties on single-end 10 GbE LFNs.

symposium on computational geometry | 1996

Experimental results of randomized clustering algorithm

Mary Inaba; Hiroshi Imai; Naoki Katoh

This paper describes computational results for k-clustering algorithm using random sampling technique [2] to show its practical usefulness. By computational experiments, first, we show that small size of samples are actually enough for 2-clustering problem. Then, we apply this algorithm for kclustering problem in a recursive manner and use the output as the initial solution of the existing local improvement technique, called k-means. We compare the result with variancebased algorithm [1, 4] which is commonly used. 1 Randomized 2-clustering algorithm Clustering is the grouping of similar objects. A k-clustering of a set is a partition of its elements into k clusters that is chosen to minimize some dissimilarity cost in each cluster. It is very fundamental and used in various fields in computer science such as pattern recognition, learning theory, image processing, computer graphics, etc. Variance-based clustering problem for a set S of n points Xi is to find a k-clustering of S into Sj (j = 1, ..., k) minimizing the clustering cost ~~=1 V(SJ ) where V(s, ) = ~ Ilz, – fi(sj)ll’, p,E s> II -II is they ~m , and Z(Sj ) is the centroid of points m SJ, I.e., ,s,, xi. For this problem, an optimal kP, c .52 clustering is induced by the Voronoi Diagram generated by k centroids of the clusters. For this problem, we have proposed the following 2-clustering randomized algorithm in [2]. We implement this algorithm for the planar case and add a local improvement step for the inner loop.

international conference on networking and computing | 2010

Compressing Floating-Point Number Stream for Numerical Applications

Hisanobu Tomari; Mary Inaba; Kei Hiraki

A cluster of commodity computers and general-purpose computers with accelerators such as GPGPUs are now common platforms to solve computationally intensive tasks like scientific simulations. Both technologies provide users with high performance at relatively low cost. However, the low bandwidth of interconnect compared to the computing performance hinders efficient operation of both cluster and accelerator in the case of many algorithms that require heavy data transmission. For clusters the network is one of the major performance bottlenecks, and for accelerators the peripheral bus to transfer data from host to the memory on the accelerator card is. In this paper, we propose a method of accelerating the performance of floating-point intensive algorithms by compressing the floating point number stream. With the efficient software encoder and hardware decoder, the method eliminates redundancy in the exponential part in the array of numbers on the stream and compacts the entire array to 82.8% of its original size at theoretical limit. The compression ratio is better than Gzip or Bzip2 for floating point numbers. The reduction in communication time directly leads to the reduction in total application running time for programs whose processing time is largely dominated by communication performance. We implemented a high-speed decoder using FPGA that operates at over 6 GB/s. We estimated the application performance using FFT and matrix multiplication on a cluster and the GRAPE-DR accelerator respectively, and our approach is useful in both configurations.

symposium on computational geometry | 1998

Voronoi diagrams by divergences with additive weights

Kunihiko Sadakane; Hiroshi Imai; Kensuke Onishi; Mary Inaba; Fumihiko Takeuchi; Keiko Imai

We introduce the Voronoi diagram by the divergence determincd by a convex function with additive weights. This class of Voronoi diagrams includes the Euclidean case and further the Voronoi diagram for normal distributions in a statistically meaningful setting. With the additive weights, the Voronoi diagram for circles is also included, These Voronoi diagrams can be handled in a unified way via appropriate potential functions and its tangent hypcrsurfaces. 1 Divcrgcnco Voronoi diagram In this acction, we define the Voronoi diagram by the divergence determined by a given convex function. The following preliminary results in sections 1.1 and 1.2 may be found in [9] and [l, 21, respectively. 1.1 Conjugacy of convex functions Let S be an open convex set in Rd. Let

International Journal of Computer and Electrical Engineering | 2016

A Simple Acceleration Method for the Louvain Algorithm

Naoto Ozaki; Hiroshi Tezuka; Mary Inaba

J be a twice differentiable and strictly convex function on S which diverges to &co at the infinity/boundary. Define ai

symposium on computational geometry | 1996

A package for triangulations

Tsuyoshi Ono; Yoshiaki Kyoda; Tomonari Masada; Kazuyoshi Hayase; Tetsuo Shibuya; Motoki Nakade; Mary Inaba; Hiroshi Imai; Keiko Imai; David Avis

= a<+(0) for 0 = [O’] E S’ by and denote [B&l in Rd by &,!J. Define S+ c Rd by s

reconfigurable computing and fpgas | 2009

Hardware Accelerator for Full-Text Search (HAFTS) with Succinct Data Structure

Naoki Tanida; Mary Inaba; Kei Hiraki; Takeshi Yoshino

= {&b(e) ] e E S} Lemma 1 When 111 is twice differentiable and strictly convex, S” is an open convex set. Define a function 4: SG + R by for 1 E S

reconfigurable computing and fpgas | 2009

Triple Line-Based Playout for Go - An Accelerator for Monte Carlo Go

Kenichi Koizumi; Mary Inaba; Kei Hiraki; Yasuo Ishii; Takefumi Miyoshi; Kazuki Yoshizoe

, The supremum is attained for 8 with a

international conference on future generation communication and networking | 2007

Flow Balancing Hardware for Parallel TCP Streams on Long Fat Pipe Network

Yutaka Sugawara; Mary Inaba; Kei Hiraki

= 7, and so define v(0) by 71(e) = w(e). ‘Tbc tanaor notation is used. Pcmksion to m&e digits1 or hard copies of all or part of this work for psrconal or clnmoom WC is gnu&d without fee provided that copies nro not mnde or dislribukd for profit or wmmercial advantage snd th3t copies bear Ohio notice end the full cifetion on the fti page. To copy oU~crv~i~, to republish, to post on servers or to redistribute to Iii require3 prior spccilic pcmksion andlor s fee. ~CCI 98 Minneapolis Minnesota USA Copyright ACM 1998 0-89791.973-4/98/6...

Explore More