Teruo Tanimoto | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Teruo Tanimoto is active.

Explore More

Publication

Featured researches published by Teruo Tanimoto.

international conference on parallel architectures and compilation techniques | 2012

Scalability-based manycore partitioning

Hiroshi Sasaki; Teruo Tanimoto; Koji Inoue; Hiroshi Nakamura

Multicore processors have been popular for years, and the industry is gradually shifting towards the era of manycore processors. Single-thread performance of microprocessors is not growing at a historical rate, but the existence of a number of active processes in the computer system and the continuing development of multi-threaded applications benefit from the growing core counts to sustain system throughput. This trend brings us a situation where a number of parallel applications simultaneously being executed on a single system. Since multi-threaded applications try to maximize its throughput by utilizing the whole system, each of them usually create equal or larger number of threads compared to underlying logical core counts. This introduces much greater number of threads to be co-scheduled in the entire system. However, each program has different characteristics (or scalability) and contends for shared resources, which are the CPU cores and memory hierarchies, with each other. Therefore, it is clear that OS thread scheduling will play a major role in achieving high system performance under such conditions. We develop a sophisticated scheduler that (1) dynamically predicts the scalability of programs via the use of hardware performance monitoring units, (2) decides the optimal number of cores to be allocated for each program, and (3) allocates the cores to programs while maximizing the system utilization to achieve fair and maximum performance. The evaluation results on a 48-core AMD Opteron system show improvements over the Linux scheduler for a variety of multiprogramming workloads.

international conference on big data | 2014

FlexDAS: A flexible direct attached storage for I/O intensive applications

Takatsugu Ono; Yotaro Konishi; Teruo Tanimoto; Noboru Iwamatsu; Takashi Miyoshi; Jun Tanaka

Big data analysis and a data storing applications require a huge volume of storage and a high I/O performance. Applications can achieve high levels of performance and cost efficiency by exploiting the high I/O performances of direct attached storages (DAS) such as internal HDDs. With the size of stored data ever increasing, it will be difficult to replace servers since internal HDDs contain huge amounts of data. In response to this issue, we propose FlexDAS, which improves the flexibility of direct attached storage by using a disk area network (DAN) without degrading the I/O performance. We developed a prototype FlexDAS switch and quantitatively evaluated the architecture. Results show that the FlexDAS switch can disconnect and connect the HDD to the server in just 1.16 seconds. The I/O performances of the disks connected via the FlexDAS switch were almost the same as the conventional DAS architecture.

IEEE Computer Architecture Letters | 2017

Heavy Tails in Program Structure

Hiroshi Sasaki; Fang Hsiang Su; Teruo Tanimoto; Simha Sethumadhavan

Designing and optimizing computer systems require deep understanding of the underlying system behavior. Historically many important observations that led to the development of essential hardware and software optimizations were driven by empirical observations about program behavior. In this paper, we report an interesting property of program structures by viewing dynamic program execution as a changing network. By analyzing the communication network created as a result of dynamic program execution, we find that communication patterns follow heavy-tailed distributions. In other words, a few instructions have consumers that are orders of magnitude larger than most instructions in a program. Surprisingly, these heavy-tailed distributions follow the iconic power law previously seen in man-made and natural networks. We provide empirical measurements based on the SPEC CPU2006 benchmarks to validate our findings as well as perform semantic analysis of the source code to reveal the causes of such behavior.

international symposium on computing and networking | 2017

CPCI Stack: Metric for Accurate Bottleneck Analysis on OoO Microprocessors

Teruo Tanimoto; Takatsugu Ono; Koji Inoue

Correctly understanding microarchitectural bottlenecks is important to optimize performance and energy of OoO (Out-of-Order) processors. Although CPI (Cycles Per Instruction) stack has been utilized for this purpose, it stacks architectural events heuristically by counting how many times the events occur, and the order of stacking affects the result, which may be misleading. It is because CPI stack does not consider the execution path of dynamic instructions. Critical path analysis (CPA) is a well-known method to identify the critical execution path of dynamic instruction execution on OoO processors. The critical path consists of the sequence of events that determines the execution time of a program on a certain processor. We develop a novel representation of CPCI stack (Cycles Per Critical Instruction stack), which is CPI stack based on CPA. The main challenge in constructing CPCI stack is how to analyze a large number of paths because CPA often results in numerous critical paths. In this paper, we show that there are more than ten to the tenth power critical paths in the execution of only one thousand instructions in 35 benchmarks out of 48 from SPEC CPU2006. Then, we propose a statistical method to analyze all the critical paths and show a case study using the benchmarks.

ieee international symposium on workload characterization | 2017

Why do programs have heavy tails

Hiroshi Sasaki; Fang Hsiang Su; Teruo Tanimoto; Simha Sethumadhavan

Designing and optimizing computer systems require deep understanding of the underlying system. Historically many important observations that led to the development of essential hardware and software optimizations were driven by empirical studies of program behavior. In this paper we report an interesting property of dynamic program execution by viewing it as a changing (or social) network. In a program social network, two instructions are friends if there is a producer-consumer relationship between them. One prominent result is that the outdegree of instructions follow heavy tails or power law distributions, i.e., a few instructions produce values for many instructions while most instructions do so for very few instructions. In other words, the number of instruction dependencies is highly skewed. In this paper we investigate this curious phenomenon. By analyzing a large set of workloads under different compilers, compilation options, ISAs and inputs we find that the dependence skew is widespread, suggesting that it is fundamental. We also observe that the skew is fractal across time and space. Finally, we describe conditions under which skew emerges within programs and provide evidence that suggests that the heavy-tailed distributions are a unique program property.

Journal of Information Processing | 2017

Dependence graph model for accurate critical path analysis on out-of-order processors

Teruo Tanimoto; Takatsugu Ono; Koji Inoue

The dependence graph model of out-of-order (OoO) instruction execution is a powerful representation used for the critical path analysis. However, most, if not all, of the previous models are out-of-date and lack enough detail to model modern OoO processors, or are too specific and complicated which limit their generality and applicability. In this paper, we propose an enhanced dependence graph model which remains simple but greatly improves the accuracy over prior models. The evaluation results using the gem5 simulator with configurations similar to Intel’s Haswell and Silvermont architecture show that the proposed enhanced model achieves CPI errors of 2.1% and 4.4% which are 90.3% and 77.1% improvements from the state-of-the-art model.

international conference on supercomputing | 2014

Hardware-assisted scalable flow control of shared receive queue

Teruo Tanimoto; Takatsugu Ono; Kohta Nakashima; Takashi Miyoshi

The total number of processor cores in supercomputers is increasing while memory size per core is decreasing due to the adoption of processors with multiple cores. Shared Receive Queue is a technique that effectively reduces the memory usage of buffers, but the absence of flow control results in excess buffer pools. We propose a hardware-assisted flow control that reduces flow control latency by 95.1%, thus enabling scalable supercomputers with multi-core processors.

Archive | 2015