Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Shinji Sako is active.

Publication


Featured researches published by Shinji Sako.


international conference of the ieee engineering in medicine and biology society | 2006

Effect of Learning on Listening to Ultra-Fast Synthesized Speech

Takuya Nishimoto; Shinji Sako; Shigeki Sagayama; Kazue Ohshima; Koichi Oda; Takayuki Watanabe

A text-to-speech synthesizer that would produce easily understandable voices at very fast speaking rates is expected to help persons with visual disability to acquire information effectively with screen reading softwares. We investigated the intelligibility of Japanese text-to-speech systems at fast speaking rates, using four-digit random numbers as the vocabulary of the recall test. We also studied the fast and intelligible text-to-speech engine, using HMM-based synthesizer with the corpus with fast speaking rate. As the results, the statistical models trained with the fast speaking corpus was effective. The learning effect was significant in the early stage of the trials and the effect sustained for several weeks


international conference on acoustics, speech, and signal processing | 2003

Improving the performance of HMM-based very low bit rate speech coding

Takahiro Hoshiya; Shinji Sako; Heiga Zen; Keiichi Tokuda; Takashi Masuko; Takao Kobayashi; Tadashi Kitantura

In this paper, we define an F0 quantization scheme for a very low bit rate speech coder based on HMM (hidden Markov model). In the coding system, the encoder carries out phoneme recognition, and transmits phoneme indices, state durations and F0 information to the decoder. In the decoder, phoneme HMM are concatenated according to the phoneme indices, and a sequence of mel-cepstral coefficient vectors is generated from the concatenated HMM. Finally we obtain synthetic speech by using the MLSA (mel log spectrum approximation) filter according to the mel-cepstral coefficients and F0 information. In addition to the F0 quantization, we investigate encoding methods for other parameters to reduce the bit rate, yet keeping the subjective speech quality. A subjective listening test shows that the performance of the proposed coder at about 100/spl sim/150 bit/s is superior to a VQ-based vocoder at 600 bit/s (mel-cepstrum: 6 bit/frame/spl times/50 frame/s, F0: 6 bit/frame/spl times/50 frame/s).


active media technology | 2014

Using Kinect for Facial Expression Recognition under Varying Poses and Illumination

Filip Malawski; Bogdan Kwolek; Shinji Sako

Emotions analysis and recognition by the smartphones with front cameras is a relatively new concept. In this paper we present an algorithm that uses a low resolution 3D sensor for facial expression recognition. The 3D head pose as well as 3D location of the fiducial points are determined using Face Tracking SDK. Tens of the features are automatically selected from a pool determined by all possible line segments between such facial landmarks. We compared correctly classified ratios using features selected by AdaBoost, Lasso and histogram-based algorithms. We compared the classification accuracies obtained both on 3D maps and RGB images. Our results justify the feasibility of low accuracy 3D sensing devices for facial emotion recognition.


international conference on document analysis and recognition | 2007

Online Handwritten Kanji Recognition Based on Inter-stroke Grammar

Ikumi Ota; Ryo Yamamoto; Shinji Sako; Shigeki Sagayama

This paper presents a new approach to online recognition of handwritten Kanji characters focusing on their hierarchical structure. Stochastic context-free grammar (SCFG) is introduced to represent the Kanji character generating process in combination with Hidden Markov Models (HMM) representing Kanji substrokes and to improve the recognition accuracy of important and frequently used Kanji characters in which inter-stroke relative positions play important roles. Combining the stroke likelihood and the relative-position likelihood between character-parts in the parsing process is expected to compensate their ambiguities. By modeling relative positions and share the models across distinct Kanji categories, a small training data can yield effective results and enables us to recognize Kanji simply by defining the SCFG rules to represent their structures without training data. Experimental results on an online handwritten Kanji database from JAIST (Japan Advanced Institute of Science and Technology) showed significant improvements in the recognition rates of some important Kanji with relatively fewer strokes and also showed little difference between the trained- and the non-trained Kanji in recognition rates.


active media technology | 2014

Ryry: A Real-Time Score-Following Automatic Accompaniment Playback System Capable of Real Performances with Errors, Repeats and Jumps

Shinji Sako; Ryuichi Yamamoto; Tadashi Kitamura

In this work, we propose an automatic accompaniment playback system called Ryry, which follows human performance and plays a corresponding accompaniment automatically, in an attempt to realize human-computer concerts. Recognizing and anticipating the score position in real-time, known as score following, by a computer is difficult. The proposed system is based on a robust on-line algorithm for real-time audio-to-score alignment. The algorithm is devised using a delayed-decision and anticipation framework by modeling real-time music performance that includes uncertainties such as tempo fluctuation and mistakes. We developed an automatic accompaniment system that is capable of generating polyphonic music signals.


international conference on human-computer interaction | 2016

Real-Time Japanese Sign Language Recognition Based on Three Phonological Elements of Sign

Shinji Sako; Mika Hatano; Tadashi Kitamura

Sign language is the visual language of deaf people. It is also natural language, different in form from spoken language. To resolve a communication barrier between hearing people and deaf, several researches for automatic sign language recognition (ASLR) system are now under way. However, existing research of ASLR deals with only small vocabulary. It is also limited in the environmental conditions and the use of equipment. In addition, compared with the research field of speech recognition, there is no large scale sign database for various reasons. One of the major reasons is that there is no official writing system for Japanese sign Language (JSL). In such a situation, we focused on the use of the knowledge of phonology of JSL and dictionary, in order to develop a develop a real-time JSL sign recognition system. The dictionary consists of over 2,000 JSL sign, each sign defined as three types of phonological elements in JSL: hand shape, motion, and position. Thanks to the use of the dictionary, JSL sign models are represented by the combination of these elements. It also can respond to the expansion of a new sign. Our system employs Kinect v2 sensor to obtain sign features such as hand shape, position, and motion. Depth sensor enables real-time processing and robustness against environmental changes. In general, recognition of hand shape is not easy in the field of ASLR due to the complexity of hand shape. In our research, we apply a contour-based method to hand shape recognition. To recognize hand motion and position, we adopted statistical models such as Hidden Markov models (HMMs) and Gaussian mixture models (GMMs). To address the problem of lack of database, our method utilizes the pseudo motion and hand shape data. We conduct experiments to recognize 223 JSL sign targeted professional sign language interpreters.


ieee automatic speech recognition and understanding workshop | 2003

Applying example-based error correction selectively

T. Yamaguchi; Shinji Sako; Hirofumi Yamamoto; Genichiro Kikui

This paper presents a supervised approach to combining detection and correction of speech recognition errors. For each word in a recognition result, our example-based correction algorithm generates a correction candidate by aligning the recognition result and an example sentence in the corpus. The distance between the aligned sentences is regarded as the reliability of the candidate. Then, an SVM (support vector machine) classifier judges whether the correction candidate should chosen by referring to the reliability score of the candidate and multiple confidence measures that are obtained from the recognition result. Experiments carried out on a travel task corpus have shown that the proposed approach achieved a 20 % reduction (from 10 % to 8 % absolute) in WER.


international conference on machine vision | 2017

Recognition of JSL finger spelling using convolutional neural networks

Hana Hosoe; Shinji Sako; Bogdan Kwolek

Recently, a few methods for recognition of hand postures on depth maps using convolutional neural networks were proposed. In this paper, we present a framework for recognition of static finger spelling in Japanese Sign Language. The recognition takes place on the basis of single gray image. The finger spelled signs are recognized using a convolutional neural network. A dataset consisting of5000 samples has been recorded. A 3D articulated hand model has been designed to generate synthetic finger spellings and to extend the real hand gestures. Experimental results demonstrate that owing to sufficient amount of training data a high recognition rate can be attained on images from a single RGB camera. The full dataset and Caffe model are available for download.


conference of the international speech communication association | 2015

Contour-based Hand Pose Recognition for Sign Language Recognition

Mika Hatano; Shinji Sako; Tadashi Kitamura

We are developing a real-time Japanese sign language recognition system that employs abstract hand motions based on three elements familiar to sign language: hand motion, position, and pose. This study considers the method of hand pose recognition using depth images obtained from the Kinect v2 sensor. We apply the contour-based method proposed by Keogh to hand pose recognition. This method recognizes a contour by means of discriminators generated from contours. We conducted experiments on recognizing 23 hand poses from 400 Japanese sign language words. Index Terms: hand pose, contour, sign language recognition, real-time, Kinect


international conference on acoustics, speech, and signal processing | 2013

Robust on-line algorithm for real-time audio-to-score alignment based on a delayed decision and anticipation framework

Ryuichi Yamamoto; Shinji Sako; Tadashi Kitamura

In this paper, we present a robust on-line algorithm for real-time audio-to-score alignment based on a delayed decision and anticipation framework. We employ Segmental Conditional Random Fields and Linear Dynamical System to model musical performance. The combination of these models allows an efficient iterative decoding of score position and tempo. The combined advantages of our approach are the delayed-decision Viterbi algorithm which utilizes future information to determine past score position with high reliability, thus improving alignment accuracy, and the fact that the future position can be anticipated using an adaptively estimated tempo. Experiments using classical music and jazz databases demonstrate the validity of our approach.

Collaboration


Dive into the Shinji Sako's collaboration.

Top Co-Authors

Avatar

Tadashi Kitamura

Nagoya Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Keiichi Tokuda

Nagoya Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Takashi Masuko

Tokyo Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Heiga Zen

Nagoya Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Kenta Okumura

Nagoya Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Junichi Yamagishi

National Institute of Informatics

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge