Takeo Hosomi
NEC
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Takeo Hosomi.
international conference on cloud computing | 2013
Erdinc Korpeoglu; Cetin Sahin; Divyakant Agrawal; Amr El Abbadi; Takeo Hosomi; Yoshiki Seo
Technology trends are not only transforming the hardware landscape of end-user devices but are also dramatically changing the types of software applications that are deployed on these devices. With the maturity of cloud computing during the past few years, users increasingly rely on networked applications that are deployed in the cloud. In particular, new applications will emerge where user interactions will be based on real-time continuous media streams instead of the traditional request-response types of interfaces. Furthermore, many of these applications will be multi-user streaming media based interactions instead of a single user interaction with an application. In this paper, we propose a geographic location-aware, hybrid, scalable cloud assisted peer-to-peer (P2P) architecture to support such applications that targets low administration cost, reduced bandwidth consumption, low latency, low initial investment cost and optimized resource usage. The main objective is to develop an efficient media delivery system that leverages locality. We propose a 3-layer novel architecture that uses at the core the cloud for application management, 2-tier edge cloud for supporting geo-dispersed user groups, and at the lowest level peer-to-peer dynamic overlays for locally clustered user groups. The proposed architecture manages multiple streaming sessions simultaneously and each streaming session is an independent entity. Our experiments on PlanetLab show that the dynamic construction and maintenance of delivering streams at both the user-level P2P overlay and edge cloud are indeed feasible and effective.
high performance computer architecture | 2000
Takeo Hosomi; Yasushi Kanoh; Masaaki Nakamura; Tetsuya Hirose
A parallel computer Cenju-4 is a cache-coherent non-uniform memory access (ccNUMA) multiprocessor and designed to be scalable up to 1024 nodes. For scalability, Cenju-4 adopts a bit-pattern directory. This scheme enables more precise representation than other imprecise schemes, such as a coarse vector scheme. Cenju-4 utilizes multicast and gathering functions of the network for delivering invalidation request messages and for collecting replies. This enables store access latency to be scalable, even when the block is shared among all nodes. Cenju-4 also prevents starvation and deadlock by queuing certain types of messages in the main memory. This enables a full solution to the starvation problem with centralized directory scheme, and to the deadlock problem with one physical or virtual network. The buffer sizes required for queuing messages at each node are only 32K bytes and two 64K bytes on a 2024-node system. In this paper, we present the design of the DSM architecture and some performance results.
symposium on vlsi circuits | 2015
Takashi Takenaka; Hiroaki Inoue; Takeo Hosomi; Yuichi Nakamura
This paper introduces an example of real-time “big data” processing systems accelerated by field-programmable gate arrays (FPGAs), which will open up a novel design field for digital circuit engineers. Contrary to the perception that software on commodity servers dominates such large-scale processing requirements, there are various chances for utilizing hardware for the acceleration. One of the most promising applications is complex event processing (CEP), which requires hardware-based acceleration due to it having to process massive amounts of data in real time. We propose a design flow for compiling software-oriented event language into highly parallelized and pipelined CEP circuits, which enables our system to achieve a strikingly high performance of 20 Gbps. A sophisticated mechanism for integrating archives of previously arrived data with streams of current events also makes the FPGA-accelerated processing system applicable to a wide range of realistic “big data” applications.
ieee international conference on high performance computing data and analytics | 1999
Yasushi Kanoh; Masaaki Nakamura; Tetsuya Hirose; Takeo Hosomi; Hirokazu Takayama; Toshiyuki Nakata
Cenju-4 is a parallel computer designed and manufactured by NEC Corp. Cenju-4 supports two memory architectures: distributed memory with user-level message passing communication and distributed shared memory with cache-coherent non-uniform memory access (cc-NUMA) feature. The Cenju-4 system consists of from 8 to 1024 nodes connected by a multistage network which has multicast, synchronization, and gather functions. Each node has a MIPS R10000 processor with up to 512 Mbyte main memory. This paper describes the architecture of Cenju-4, especially its multistage network and network interface. In addition, performance results are presented for message passing communication.
2013 IEEE COOL Chips XVI | 2013
Kazuhisa Ishizaka; Takamichi Miyamoto; S. Akimoto; A. Iketani; Takeo Hosomi; Junji Sakai
Super Resolution image processing (SR) is a heavy task for a todays mid-range Xeon server. To accelerate SR, we utilize a server system with manycore coprocessor, Intel Xeon Phi coprocessor. Function offload model is a usual execution model for those systems. However it is difficult for SR to increase utilization of both host processors and coprocessors by the model. We propose a virtual pipeline model which can fully utilize both processors. Experimental results show that our SR improves performance 3.3 times and performance/watt 1.5 times. Our SR achieves 30 frames per sec from SD to HD.
Innovative Architecture for Future Generation High-Performance Processors and Systems | 1998
Yasushi Kanoh; Tetsuya Hirose; Masaaki Nakamura; Takeo Hosomi; Kosuke Tatsukawa; Hiroyuki Araki; Tomoyoshi Sugawara; Toshiyuki Nakata
This paper describes the architecture and the evaluation results of a parallel computer Cenju-4. Cenju-4 supports two memory architectures: distributed memory with user-level message passing communication and distributed shared memory with cache-coherent nonuniform memory access (cc-NUMA) feature. Cenju-4 system consists of from 8 to 1024 nodes connected by a multistage network which has multicast, synchronization, and gather functions. Cenju-4 adopts a Mach micro kernel based operating system, which provides several services for parallel processing. We attained 5.5 psec communication latency and 168 Mbytes/sec communication throughput an message passing communication.
Archive | 2005
Takeo Hosomi
Archive | 2002
Takeo Hosomi
Archive | 1999
Takeo Hosomi
Archive | 2005
Takeo Hosomi; Yoshiaki Watanabe