Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Dan Mikami is active.

Publication


Featured researches published by Dan Mikami.


IEEE Transactions on Audio, Speech, and Language Processing | 2012

Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera

Takaaki Hori; Shoko Araki; Takuya Yoshioka; Masakiyo Fujimoto; Shinji Watanabe; Takanobu Oba; Atsunori Ogawa; Kazuhiro Otsuka; Dan Mikami; Keisuke Kinoshita; Tomohiro Nakatani; Atsushi Nakamura; Junji Yamato

This paper presents our real-time meeting analyzer for monitoring conversations in an ongoing group meeting. The goal of the system is to recognize automatically “who is speaking what” in an online manner for meeting assistance. Our system continuously captures the utterances and face poses of each speaker using a microphone array and an omni-directional camera positioned at the center of the meeting table. Through a series of advanced audio processing operations, an overlapping speech signal is enhanced and the components are separated into individual speakers channels. Then the utterances are sequentially transcribed by our speech recognizer with low latency. In parallel with speech recognition, the activity of each participant (e.g., speaking, laughing, watching someone) and the circumstances of the meeting (e.g., topic, activeness, casualness) are detected and displayed on a browser together with the transcripts. In this paper, we describe our techniques and our attempt to achieve the low-latency monitoring of meetings, and we show our experimental results for real-time meeting transcription.


computer vision and pattern recognition | 2009

Memory-based Particle Filter for face pose tracking robust under complex dynamics

Dan Mikami; Kazuhiro Otsuka; Junji Yamato

A novel particle filter, the memory-based particle filter (M-PF), is proposed that can visually track moving objects that have complex dynamics. We aim to realize robustness against abrupt object movements and quick recovery from tracking failure caused by factors such as occlusions. To that end, we eliminate the Markov assumption from the previous particle filtering framework and predict the prior distribution of the target state from the long-term dynamics. More concretely, M-PF stores the past history of the estimated target states, and employs a random sampling from the history to generate prior distribution; it represents a novel PF formulation.Our method can handle nonlinear, time-variant, and non-Markov dynamics, which is not possible within existing PF frameworks. Accurate prior prediction based on proper dynamics model is especially effective for recovering lost tracks, because it can provide possible target states, which can drastically change since the track was lost. We target the face pose of seated humans in this paper. Quantitative evaluations with magnetic sensors confirm improved accuracy in face pose estimation and successful recovery from tracking loss. The proposed M-PF suggests a new paradigm for modeling systems with complex dynamics and so offers a various visual tracking applications.


spoken language technology workshop | 2010

Real-time meeting recognition and understanding using distant microphones and omni-directional camera

Takaaki Hori; Shoko Araki; Takuya Yoshioka; Masakiyo Fujimoto; Shinji Watanabe; Takanobu Oba; Atsunori Ogawa; Kazuhiro Otsuka; Dan Mikami; Keisuke Kinoshita; Tomohiro Nakatani; Atsushi Nakamura; Junji Yamato

This paper presents our newly developed real-time meeting analyzer for monitoring conversations in an ongoing group meeting. The goal of the system is to automatically recognize “who is speaking what” in an online manner for meeting assistance. Our system continuously captures the utterances and the face pose of each speaker using a distant microphone array and an omni-directional camera at the center of the meeting table. Through a series of advanced audio processing operations, an overlapping speech signal is enhanced and the components are separated into individual speakers channels. Then the utterances are sequentially transcribed by our speech recognizer with low latency. In parallel with speech recognition, the activity of each participant (e.g. speaking, laughing, watching someone) and the situation of the meeting (e.g. topic, activeness, casualness) are detected and displayed on a browser together with the transcripts. In this paper, we describe our techniques and our attempt to achieve the low-latency monitoring of meetings, and we show our experimental results for real-time meeting transcription.


european conference on computer vision | 2010

Memory-based particle filter for tracking objects with large variation in pose and appearance

Dan Mikami; Kazuhiro Otsuka; Junji Yamato

A novel memory-based particle filter is proposed to achieve robust visual tracking of a targets pose even with large variations in targets position and rotation, i.e. large appearance changes. The memorybased particle filter (M-PF) is a recent extension of the particle filter, and incorporates a memory-based mechanism to predict prior distribution using past memory of target state sequence; it offers robust target tracking against complex motion. This paper extends the M-PF to a unified probabilistic framework for joint estimation of the targets pose and appearance based on memory-based joint prior prediction using stored past pose and appearance sequences. We call it the Memory-based Particle Filter with Appearance Prediction (M-PFAP). A memory-based approach enables generating the joint prior distribution of pose and appearance without explicit modeling of the complex relationship between them. M-PFAP can robustly handle the large changes in appearance caused by large pose variation, in addition to abrupt changes in moving direction; it allows robust tracking under self and mutual occlusion. Experiments confirm that M-PFAP successfully tracks human faces from frontal view to profile view; it greatly eases the limitations of M-PF.


international conference on image processing | 2016

Eye gaze analysis and learning-to-rank to obtain the most preferred result in image inpainting

Mariko Isogawa; Dan Mikami; Kosuke Takahashi; Akira Kojima

This paper proposes a method that blindly predicts preference order between inpainted images, aiming at selecting the best one from a plurality of results. Image inpainting, which removes unwanted regions and restores them, has attracted recent attention. However, it is known that the inpainting result varies largely with the method used for inpainting and the parameters set. Thus, in a typical use case, users need to manually select the inpainting method and the parameter that yields the best one. This manual selection takes a great deal of time and thus there is a great need for a way to automatically estimate the best result. Although some methods, such as estimating perceptual preference score from image features, have been proposed in recent years, none of them are considered very promising approaches. Our method focuses on the following two points: (1) what we essentially need is a preference order relation rather than an absolute score, and (2) we consider that image features for order estimation can be effectively designed by using actually measured human visual attention. Comparison with other image quality assessment methods shows that our method estimates the preference order with high accuracy.


acm multimedia | 2011

A system for reconstructing multiparty conversation field based on augmented head motion by dynamic projection

Kazuhiro Otsuka; Kamil Sebastian Mucha; Shiro Kumano; Dan Mikami; Masafumi Matsuda; Junji Yamato

A novel system is presented for reconstructing, in the real world, multiparty face-to-face conversation scenes; it uses dynamics projection to augment human head motion. This system aims to display and playback pre-recorded conversations to the viewers as if the remote people were taking in front of them. This system consists of multiple projectors and transparent screens. Each screen separately displays the life-size face of one meeting participant, and are spatially arranged to recreate the actual scene. The main feature of this system is dynamics projection, screen pose is dynamically controlled to emulate the head motions of the participants, especially rotation around the vertical axis, that are typical of shifts in visual attention, i.e. turning gaze from one to another. This recreation of head motion by physical screen motion, in addition to image motion, aims to more clearly express the interactions involving visual attention among the participants. The minimal design, frameless-projector-screen, with augmented head motion is expected to create a feeling that the remote participants are actually present in the same room. This demo presents our initial system and discusses its potential impact on future visual communications.


systems, man and cybernetics | 2011

Early facial expression recognition with high-frame rate 3D sensing

Lumei Su; Shiro Kumano; Kazuhiro Otsuka; Dan Mikami; Junji Yamato; Yoichi Sato

This work investigates a new challenging problem: how to exactly recognize facial expression as early as possible, while most works generally focus on improving the recognition rate of facial expression recognition. The features of facial expressions in their early stage are unfortunately very sensitive to noise due to their low intensity. So, we propose a novel wavelet spectral subtraction method to spatio-temporally refine the subtle facial expression features. Moreover, in order to achieve early facial expression recognition, we newly introduce an early AdaBoost algorithm for facial expression recognition problem. Experiments using our database established by using a high-frame rate 3D sensing showed that the proposed method has a promising performance on early facial expression recognition.


Multimedia Tools and Applications | 2017

Image and video completion via feature reduction and compensation

Mariko Isogawa; Dan Mikami; Kosuke Takahashi; Akira Kojima

This paper proposes a novel framework for image and video completion that removes and restores unwanted regions inside them. Most existing works fail to carry out the completion processing when similar regions do not exist in undamaged regions. To overcome this, our approach creates similar regions by projecting a low dimensional space from the original space. The approach comprises three stages. First, input images/videos are converted to a lower dimensional feature space. Second, a damaged region is restored in the converted feature space. Finally, inverse conversion is performed from the lower dimensional space to the original space. This generates two advantages: (1) it enhances the possibility of applying patches dissimilar to those in the original color space and (2) it enables the use of many existing restoration methods, each having various advantages, because the feature space for retrieving the similar patches is the only extension. The framework’s effectiveness was verified in experiments using various methods, the feature space for restoration in the second stage, and inverse conversion methods.


2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays | 2011

Low-latency meeting recognition and understanding using distant microphones

Shoko Araki; Takaaki Hori; Takuya Yoshioka; Masakiyo Fujimoto; Shinji Watanabe; Takanobu Oba; Atsunori Ogawa; Kazuhiro Otsuka; Dan Mikami; Marc Delcroix; Keisuke Kinoshita; Tomohiro Nakatani; Atsushi Nakamura; Junji Yamato

In this demonstration, we present our real-time meeting analyzer for group meetings. By using the audio and visual information captured by a microphone array and an omni-directional camera at the center of a table, our system automatically recognizes “who speaks what to whom and when” in an online manner. We will show some demo videos and our meeting browser to present how our system works in a meeting situation. The technical details will also be discussed at the demo session.


systems man and cybernetics | 2000

Self-growing learning vector quantization with additional learning and rule extraction abilities

Dan Mikami; Masafumi Hagiwara

We propose a self-growing learning vector quantization (SGLVQ). The proposed SGLVQ is constructed based on the self-organizing map (SOM) and the learning vector quantization (LVQ). Learning of the SGLVQ consists of 3 steps: SOM step, LVQ step, and rule extraction step. In the LVQ step, neurons are incremented and the size of the network is adjusted automatically. The incrementation of neurons enables additional learning and contributes to obtain high recognition ability. In the rule extraction step, rules can be extracted. Computer experiments show the improvement of the recognition rate, the ability of additional learning and extraction of the rules.

Collaboration


Dive into the Dan Mikami's collaboration.

Top Co-Authors

Avatar

Akira Kojima

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar

Kazuhiro Otsuka

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Shoko Araki

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Atsushi Nakamura

Nippon Telegraph and Telephone

View shared research outputs
Researchain Logo
Decentralizing Knowledge