Yurie Iribe
Aichi Prefectural University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yurie Iribe.
asia pacific signal and information processing association annual summit and conference | 2015
Satoshi Tamura; Hiroshi Ninomiya; Norihide Kitaoka; Shin Osuga; Yurie Iribe; Kazuya Takeda; Satoru Hayamizu
This paper develops an Audio-Visual Speech Recognition (AVSR) method, by (1) exploring high-performance visual features, (2) applying audio and visual deep bottleneck features to improve AVSR performance, and (3) investigating effectiveness of voice activity detection in a visual modality. In our approach, many kinds of visual features are incorporated, subsequently converted into bottleneck features by deep learning technology. By using proposed features, we successfully achieved 73.66% lipreading accuracy in speaker-independent open condition, and about 90% AVSR accuracy on average in noisy environments. In addition, we extracted speech segments from visual features, resulting 77.80% lipreading accuracy. It is found VAD is useful in both audio and visual modalities, for better lipreading and AVSR.
international conference on acoustics, speech, and signal processing | 2012
Yurie Iribe; Silasak Manosavan; Kouichi Katsurada; Ryoko Hayashi; Chunyue Zhu; Tsuneo Nitta
Computer-assisted pronunciation training (CAPT) was introduced for language education in recent years. CAPT scores the learners pronunciation quality and points out wrong phonemes by using speech recognition technology. However, although the learner can thus realize that his/her speech is different from the teachers, the learner still cannot control the articulation organs to pronounce correctly. The learner cannot understand how to correct the wrong articulatory gestures precisely. We indicate these differences by visualizing a learners wrong pronunciation movements and the correct pronunciation movements with CG animation. We propose a system for generating animated pronunciation by estimating a learners pronunciation movements from his/her speech automatically. The proposed system maps speech to coordinate values that are needed to generate the animations by using multilayer perceptron neural networks (MLP). We use MRI data to generate smooth animated pronunciations. Additionally, we verify whether the vocal tract area and articulatory features are suitable as characteristics of pronunciation movement through experimental evaluation.
international workshop on machine learning for signal processing | 2013
Narpendyah Wisjnu Ariwardhani; Yurie Iribe; Kouichi Katsurada; Tsuneo Nitta
In this paper, we propose voice conversion based on articulatory-movement (AM) to vocal tract parameter (VTP) mapping. An artificial neural network (ANN) is applied to map AM to VTP and to convert the source speakers voice to the target speakers voice. The proposed system is not only text independent voice conversion, but can also be used for an arbitrary source speaker. This means that our approach requires no source speaker data to build the voice conversion model and hence source speaker data is only required during testing phase. Preliminary cross-lingual voice conversion experiments are also conducted. The results of voice conversion were evaluated using subjective and objective measures to compare the performance of our proposed ANN-based voice conversion (VC) with the state-of-the-art Gaussian mixture model (GMM)-based VC. The experimental results show that the converted voice is intelligible and has speaker individuality of the target speaker.
International Journal of Knowledge and Web Intelligence | 2010
Yurie Iribe; Hiroaki Nagaoka; Katsurada Kouichi; Tsuneo Nitta
One of todays hottest topics in the field of education is the effectiveness of Learning Management Systems have introduced text chat, bulletin boards into the classroom. However, these systems do not provide visual explanation, by drawing diagrams or numbers on the presentation slides. We developed a classroom lecture system that encourages teacher-to-student and student-to-student communication by means of sharing slides drawn notes using a digital pen. The teacher and students can confirm the explanation by sharing these slides. As a result, enhanced understanding is achieved through classroom questions and answers using both text and visual explanation.
international conference on knowledge based and intelligent information and engineering systems | 2010
Kazunori Nishino; Yurie Iribe; Shinji Mizuno; Kumiko Aoki; Yoshimi Fukumura
As a result of investigation on learning preferences and e-learning course adaptability among the students of full online courses offered by a consortium of higher education institutions, it has been found that there is a relationship between the two factors (the preference for asynchronous learning and that for the use of computer) in the learning preferences and the e-learning course adaptability. In addition, it has been found that the learning preferences of a student may change after taking an e-learning course. Furthermore, it has been found that a multiple regression analysis of the learning preferences of a student at the beginning of an e-learning course can somewhat predict the adaptability of the course at the end.
Intelligent Decision Technologies | 2010
Kazunori Nishino; Yurie Iribe; Shinji Mizuno; Kumiko Aoki; Yoshimi Fukumura
In this study, a questionnaire on learning preferences was administered to students who were enrolled in e-learning courses offered by the collaborative project among several higher education institutions and we extracted factors in learning preferences. A strong correlation between the factor loadings (the preference for asynchronous learning and the one for the use of ICT) and the adaptability to e-learning courses was found. As a result of multiple regression analyses, it was found that, to some extent, we could predict the students adaptability to an e-learning course by measuring his/her preferences for asynchronous learning and the use of ICT. Furthermore, based on these analyses, the paper discusses an effective e-learning system that offers learning courses and learning objects that are suitable for particular students.
intelligent vehicles symposium | 2014
Yasuhiko Nakano; Satoshi Sano; Yuzuru Yamakage; Takao Kojima; Chika Kishi; Chisa Takahasi; Yurie Iribe; Haruki Kawanaka; Koji Oguri
Traffic accidents involving older drivers have been increasing all over the world. In order to assess elder driving performance and predict the risk of traffic accidents, we analyzed data from specific license renewal tests that are obligatory for Japanese drivers aged 70 years old or older, which includes a driving simulator test and an on-road test. As a result of the analysis, we found that aging affects several test results, such as the percentage of correct answers and the reaction times in multiple judgment tasks tests. In order to be able to classify a driver as a high accident risk, we performed an outlier analysis using a one-class SVM to investigate performance characteristics, and also performed a logistics regression analysis. Using parameters strongly related to cognitive decline, we found a viable way to classify impaired drivers. Driving is a complex task requiring integration of cognition, judgment, and operation skills. Deterioration of these skills is likely to increase the risk of traffic accidents. Although our final objective was to support elderly drivers suffering such deterioration, we initially studied a measurement method to detect the area and extent of deterioration effectively.
2014 International Conference of Advanced Informatics: Concept, Theory and Application (ICAICTA) | 2014
Seng Kheang; Kouichi Katsurada; Yurie Iribe; Tsuneo Nitta
The quality of a grapheme-to-phoneme (G2P) conversion plays an important role in developing high quality speech synthesis systems. Because many problems regarding the G2P conversion have been reported, we propose a novel two-stage model-based approach, which is implemented using an existing Weighted Finite-State Transducer-based G2P conversion framework, to improve the performance of the G2P conversion model. The first stage model is built for automatic conversion of word to phonemes, while the second stage model utilizes the input graphemes and output phonemes obtained from the first-stage to determine the best final output phoneme sequence. Additionally, we design new grapheme generation rules, which enable extra detail for the vowel graphemes appearing within a word. When compared with previous approaches, the evaluation results show that our approach slightly improves the accuracy of the out-of-vocabulary dataset and consistently increases the accuracy of the in-vocabulary dataset.
international conference on knowledge based and intelligent information and engineering systems | 2005
Kiichirou Sasaki; Yurie Iribe; Masato Goto; Mamoru Endo; Takami Yasuda; Shigeki Yoko
This research paper aims to develop a simple Web mail system for senior citizens to make the best use of their empirical knowledge as social properties, which is one plan of an informationization promotion project by industrial-government-academic cooperation begun in 2004. A simple Web mail system is made for trial purposes as a concrete example for senior citizens based on their needs, and an indicator of grappling with the current state in development and future problems is described.
international conference on acoustics, speech, and signal processing | 2013
Yurie Iribe; Silasak Manosavanh; Kouichi Katsurada; Ryoko Hayashi; Chunyue Zhu; Tsuneo Nitta
We describe computer-assisted pronunciation training (CAPT) through the visualization of the articulatory gestures from learners speech in this paper. Typical CAPT systems cannot indicate how the learner can correct his/her articulation. The proposed system enables the learner to study how to correct their pronunciation by comparing the wrongly pronounced gesture with a correctly pronounced gesture. In this system, a multi-layer neural network (MLN) is used to convert the learners speech into the coordinates for a vocal tract using Magnetic Resonance Imaging data. Then, an animation is generated using the values of the vocal tract coordinates. Moreover, we improved the animations by introducing an anchor-point for a phoneme to MLN training. The new system could even generate accurate CG animations from the English speech by Japanese people in the experiment.