Hiroshi Shimodaira | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hiroshi Shimodaira is active.

Explore More

Publication

Featured researches published by Hiroshi Shimodaira.

international conference on document analysis and recognition | 2001

Substroke approach to HMM-based on-line Kanji handwriting recognition

Mitsuru Nakai; Naoto Akira; Hiroshi Shimodaira; Shigeki Sagayama

A new method is proposed for online handwriting recognition of Kanji characters. The method employs substroke HMM as minimum units to constitute Japanese Kanji characters and utilizes the direction of pen motion. The main motivation is to fully utilize the continuous speech recognition algorithm by relating sentence speech to Kanji character phonemes to substrokes, and grammar to Kanji structure. The proposed system consists input feature analysis, substroke HMM, a character structure dictionary and a decoder. The present approach has the following advantages over the conventional methods that employ whole character HMM. 1) Much smaller memory requirement for dictionary and models. 2) Fast recognition by employing efficient substroke network search. 3) Capability of recognizing characters not included in the training data if defined as a sequence of substrokes in the dictionary. 4) Capability of recognizing characters written by various different stroke orders with multiple definitions per one character in the dictionary. 5) Easiness in HMM adaptation to the user with a few sample character data.

international conference on acoustics, speech, and signal processing | 2001

Multiple-regression hidden Markov model

Katsuhisa Fujinaga; Mitsuru Nakai; Hiroshi Shimodaira; Shigeki Sagayama

Proposes a class of hidden Markov model (HMM) called multiple-regression HMM (MR-HMM) that utilizes auxiliary features such as fundamental frequency (F/sub 0/) and speaking styles that affect spectral parameters to better model the acoustic features of phonemes. Though such auxiliary features are considered to be the factors that degrade the performance of speech recognizers, the proposed MR-HMM adapts its model parameters, i.e. mean vectors of output probability distributions, depending on these auxiliary information to improve the recognition accuracy. Formulation for parameter reestimation of MR-HMM based on the EM algorithm is given in the paper. Experiments of speaker-dependent isolated word recognition demonstrated that MR-HMMs using F/sub 0/ based auxiliary features reduced the error rates by more than 20% compared with the conventional HMMs.

Life-like characters | 2004

Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents

Shinichi Kawamoto; Hiroshi Shimodaira; Tsuneo Nitta; Takuya Nishimoto; Satoshi Nakamura; Katsunobu Itou; Shigeo Morishima; Tatsuo Yotsukura; Atsuhiko Kai; Akinobu Lee; Yoichi Yamashita; Takao Kobayashi; Keiichi Tokuda; Keikichi Hirose; Nobuaki Minematsu; Atsushi Yamada; Yasuharu Den; Takehito Utsuro; Shigeki Sagayama

Galatea is a software toolkit to develop a human-like spoken dialog agent. In order to easily integrate the modules of different characteristics including speech recognizer, speech synthesizer, facial animation synthesizer, and dialog controller, each module is modeled as a virtual machine having a simple common interface and connected to each other through a broker (communication manager). Galatea employs model-based speech and facial animation synthesizers whose model parameters are adapted easily to those for an existing person if his or her training data is given. The software toolkit that runs on both UNIX/Linux and Windows operating systems will be publicly available in the middle of 2003 [7, 6].

international conference on document analysis and recognition | 2003

On-line overlaid-handwriting recognition based on substroke HMMs

Hiroshi Shimodaira; Takashi Sudo; Mitsuru Nakai; Shigeki Sagayama

This paper proposes a novel handwriting recognition interfacefor wearable computing where users write characterscontinuously without pauses on a small single writingbox. Since characters are written on the same writingarea, they are overlaid with each other. Therefore thetask is regarded as a special case of the continuous characterrecognition problem. In contrast to the conventionalcontinuous character recognition problem, location informationof strokes does not help very much in the proposedframework. To tackle the problem, substroke based hiddenMarkov models (HMMs) and a stochastic bigram languagemodel are employed. Preliminary experiments were carriedout on a dataset of 578 handwriting sequences with acharacter bigram consisting of 1,016 Japanese educationalKanji and 71 Hiragana characters. The proposed methoddemonstrated promising performance with 69.2% of hand-writingsequences beeing correctly recognized when differentstroke order was permitted, and the rate was improvedup to 88.0% when characters were written with fixed strokeorder.

conference on tools with artificial intelligence | 2000

A visualization tool for interactive learning of large decision trees

Trong Dung Nguyen; Tu Bao Ho; Hiroshi Shimodaira

Decision tree induction is certainly among the most applicable learning techniques due to its power and simplicity. However learning decision trees from large datasets, particularly in data mining, is quite different from learning from small or moderately sized datasets. When learning from large datasets, decision tree induction programs often produce very large trees. How to efficiently visualize trees in the learning process, particularly large trees, is still questionable and currently requires efficient tools. The paper presents a visualization tool for interactive learning of large decision trees, that includes a new visualization technique called T2.5D (Trees 2.5 Dimensions). After a brief discussion on requirements for tree visualizers and related work, the paper focuses on presenting developing techniques for two issues: (1) how to visualize efficiently large decision trees; and (2) how to visualize decision trees in the learning process.

international conference on pattern recognition | 2002

Pen pressure features for writer-independent on-line handwriting recognition based on substroke HMM

Mitsuru Nakai; Takashi Sudo; Hiroshi Shimodaira; Shigeki Sagayama

Discusses the use of pen pressure as a feature in writer-independent on-line handwriting recognition. We propose two kinds of features related to pen pressure: one is the pressure representing pen ups and downs in a continuous manner; the other is the time-derivative of the pressure representing the temporal pattern of the pen pressure. Combining either of them with the existing feature (velocity vector), a 3-dimensional feature is composed for character recognition. Some techniques of interpolating the pen pressure during the pen-up interval is also proposed for a pre-processing purpose. Through experimental evaluation using 1,016 elementary Kanji characters compared with the baseline performance using velocity vector only, the additional use of pen pressure improved the performance from 97.5% to 98.1% for careful writings and from 91.1% to 93.1% for cursive writings.

spoken language technology workshop | 2006

REINFORCEMENT LEARNING OF DIALOGUE STRATEGIES WITH HIERARCHICAL ABSTRACT MACHINES

Heriberto Cuayáhuitl; Steve Renals; Oliver Lemon; Hiroshi Shimodaira

In this paper we propose partially specified dialogue strategies for dialogue strategy optimization, where part of the strategy is specified deterministically and the rest optimized with reinforcement learning (RL). To do this we apply RL with hierarchical abstract machines (HAMs). We also propose to build simulated users using HAMs, incorporating a combination of hierarchical deterministic and probabilistic behaviour. We performed experiments using a single-goal flight booking dialogue system, and compare two dialogue strategies (deterministic and optimized) using three types of simulated user (novice, experienced and expert). Our results show that HAMs are promising for both dialogue optimization and simulation, and provide evidence that indeed partially specified dialogue strategies can outperform deterministic ones (on average 4.7 fewer system turns) with faster learning than the traditional RL framework.

international conference on acoustics, speech, and signal processing | 2014

Speech driven talking head from estimated articulatory features

Atef Ben-Youssef; Hiroshi Shimodaira; David A. Braude

In this paper, we present a talking head in which the lips and head motion are controlled using articulatory movements estimated from speech. A phone-size HMM-based inversion mapping is employed and trained in a semi-supervised fashion. The advantage of the use of articulatory features is that they can drive the lips motions and they have a close link with head movements. Speech inversion normally requires the training data recorded with electromagnetic articulograph (EMA), which restricts the naturalness of head movements. The present study considers a more realistic recording condition where the training data for the target speaker are recorded with a usual motion capture system rather than EMA. Different temporal clustering techniques are investigated for HMM-based mapping as well as a GMM-based frame-wise mapping as a baseline system. Objective and subjective experiments show that the synthesised motions are more natural using an HMM system than a GMM one, and estimated EMA features outperform prosodic features.

international conference on spoken language processing | 1996

Using prosodic information to constrain language models for spoken dialogue

Paul Taylor; Hiroshi Shimodaira; Stephen Isard; Simon King; Jacqueline C. Kowtko

We present work intended to improve speech recognition performance for computer dialogue by taking into account the way that dialogue context and intonational tune interact to limit the possibilities for what an utterance might be. We report on the extra constraint achieved in a bigram language model, expressed in terms of entropy, by using separate submodels for different sorts of dialogue acts, and trying to predict which submodel to apply by analysis of the intonation of the sentence being recognised.

international conference on acoustics, speech, and signal processing | 1992

Accent phrase segmentation using pitch pattern clustering

Hiroshi Shimodaira; M. Kumura

A novel algorithm for breaking continuous speech into accent phrases based on an optimal phrase segmentation criterion is proposed. The optimal segmentation is carried out by using a one-stage DP search algorithm between a pitch pattern from input speech and pitch-pattern templates of accent phrases. The LBG clustering algorithm is used to make up the pitch-pattern templates from a large number of training data. Experiments showed more than 88% of the accent-phrase boundaries were correctly detected. A novel feature value for representing pitch patterns instead of fundamental frequency is also presented.<<ETX>>

Explore More