Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Po-Yi Shih is active.

Publication


Featured researches published by Po-Yi Shih.


Journal of Network and Computer Applications | 2011

Robust several-speaker speech recognition with highly dependable online speaker adaptation and identification

Po-Yi Shih; Po-Chuan Lin; Jhing-Fa Wang; Yuan-Ning Lin

The currently adaptive mechanisms adapt a single acoustic model for a speaker in speaker-independent speech recognition system. However, as more users use the same speech recognizer, single acoustic model adaptation leads to negative adaptation upon switching between users. Such a situation is problematic (undependable adaptation). This paper, considering the situation of a smart home or an office with staff members, presents the speaker-specific acoustic model adaptation based on a multi-model mechanism, to solve the problem of undependable adaptation. First, the identification of the current speaker is confirmed using the SVM classifier, then the corresponding acoustic parameters are extracted and integrated with the speaker-independent acoustic model to yield the speaker-dependent acoustic model and speech recognition accuracy then be promoted for the current speaker. To provide dependable adaptation data to achieve online positive speaker adaptation, a mechanism that measures confidence score is designed to verify each recognition result and determined whether it can be an adaptation datum. The experimental results indicate that the proposed system can effectively increase the average speech recognition accuracy from 62% to 85%. Thus, the proposed system can achieve robust several-speaker speech recognition with highly dependable online speaker adaptation and identification.


ieee region 10 conference | 2011

Human-robot interaction based on cloud computing infrastructure for senior companion

Yan-You Chen; Jhing-Fa Wang; Po-Chuan Lin; Po-Yi Shih; Hsin-Chun Tsai; Da-Yu Kwan

This paper presents a human-robot interactive system for senior companion based on cloud computing infrastructure. The proposed senior companion robot system (SCRS) is designed based on cloud computing network. In the server side, two cloud services are proposed, 1) the web-based user remote management service (WURMS) for remote robot control; 2) the robotic multimodal interactive computation services (RMICS) for providing the human-robot operation interfaces including the speech/sound recognition, speaker identification, face identification, sound source estimation and text to speech (TTS). In the robot client side, the behavior model is designed to use WURMS and RMICS services. In the experiments, two robots called “Robert” and “Davinci” are designed to evaluate the SCRSs capability. With using only low-cost and low-power CPUs (Intel Atom N450), both of the two robots can still work wirelessly for real-time human-robot interaction. Finally, we design five senior companion scenarios, and the experimental average MOS (Mean Opinion Score) is 4.16.


sensor networks ubiquitous and trustworthy computing | 2008

Notice of Violation of IEEE Publication Principles Acoustic and Phoneme Modeling Based on Confusion Matrix for Ubiquitous Mixed-Language Speech Recognition

Po-Yi Shih; Jhing-Fa Wang; Hsiao Ping Lee; Hung-Jen Kai; Hung-Tzu Kao; Yuan-Ning Lin

This work presents a novel approach to acoustic and phoneme modeling in order to recognize ubiquitous mixed-language speech. The conventional approaches to perform multilingual speech recognition are the usage of a multilingual phone set. A confusion matrix combining acoustic between every two phonetic is built for phonetic unit clustering. In this work, we are interested in speaker independent voice command recognition. The IPA representation is adapted for phonetic unit modeling. The EAT is applied to construct speaker independent acoustic models. The experimental results show that the proposed method can perform 70-80% lexicon recognition accuracy.


international conference on orange technologies | 2014

A happiness-oriented home care system for elderly daily living

Yang-Yen Ou; Po-Yi Shih; Ta-Wen Kuan; Shao-Hsien Shih; Jhing-Fa Wang; Jaw-Shyang Wu

Currently the modern developing home-care systems highlight the functionalities on bio-signals measurement, security surveillance and health care, however most of them work independently. In this paper, a newly warming-care framework for elderly is proposed, not only to reach the aforementioned services, but also including following kindly services, that is, the remote monitoring, the web camera management, the emergency call for help, the behavior recognition and feedback, and the remote control entertainment services, to reach a comprehensive humanistic-caring system. The proposed framework is motivated by the individual alphabet on “HAPPINESS” which are redefined and interpreted as “Health”, “Ability”, “Protection”, “Personalization”, “Interaction”, “Nursing”, “Entertainment”, “Succor” and “Smile”. Three main services are spotlighted to achieve the goals described below. The Web-based Central Camera Management Service (WCCMS) is a real-time remote monitoring function that a caregiver can pay attention to care elderly anytime and anywhere through web services; the Multimodal Human-Machine Interaction Service (MHMIS) provides the audio-visual cognitive functions to interact with elderly, and the Web-based User Management Service (WUMS) gives user a smart HMI interface including bio-signal measurement, help button, remote control, and hospital appointment scheduling functionalities. To evaluate the proposed framework usability, MOS (Mean Opinion Score) is applied and average MOS 4.2 score is acquired that reveals the proposed system expectable.


international symposium on chinese spoken language processing | 2012

Enhanced lengthening cancellation using bidirectional pitch similarity alignment for spontaneous speech

Po-Yi Shih; Bo-Wei Chen; Jhing-Fa Wang; Jhing-Wei Wu

In this work, an enhanced lengthening cancellation method is proposed to detect and cancel the lengthening part of vowels. The proposed method consists of autocorrelation function, cosine similarity-based lengthening detection and bidirectional pitch contour alignment. Autocorrelation function is used to obtain the reference pitch contour. Cosine similarity-based method is applied to measure the similarity between the reference and the next adjacent pitch contours. Due to the variant lengths of periodic segments, fixed size frames may cause accumulative errors. Therefore, bidirectional pitch contour alignment is adopted in this study. Experiments indicate that the proposed method can achieve an accuracy rate of 91.4% and 88.7% on a 60-keyword and 50-scentence database, respectively. Moreover, the proposed approach performs about three times speed than the baseline. Such results prove the effectiveness of the proposed method.


international symposium on neural networks | 2010

Kernel-Based lip shape clustering with phoneme recognition for real-time voice driven talking face

Po-Yi Shih; Jhing-Fa Wang; Zong-You Chen

This work describes a real-time voice driven method using which a speakers lip shape is synchronized with the corresponding speech signal, for a low bandwidth mobile devices Phoneme recognition is generally regarded as an important task in the operation of a real-time lip-sync system In this work, the use of the kernel-based lip shape clustering algorithm is inspired based on one-class support vector machines (SVM) A set of speaker who has similar lip shape is clustered and a cluster-dependent vowel phoneme is then constructed for each cluster We use sum of absolute difference (SAD) as vowel lip shape likelihood to cluster into categories Then adjust the source and destination pictures of lip shape in the transparent level using alpha blending for lip-sync animation We find that this method outperforms conventional CHMM method in phoneme error rate (PER), 8.78% and 32.25%, respectively.


international conference on orange technologies | 2014

A spoken dialogue system with situation and emotion detection based on anthropomorphic learning for warming healthcare d

Bo-Hao Su; Ping-Wen Fu; Po-Chuan Lin; Po-Yi Shih; Yuh-Chung Lin; Jhing-Fa Wang; An-Chao Tsai

This work presents a spoken dialogue system with situation and emotion detection based on anthropomorphic learning for warming healthcare. To provide more warming feedback of the system, we combine situation and emotion detection with spoken dialogue system. Situation and emotion detection are based on lexical category using Partial-Matching Spoken Sentence Retrieval (PMSSR). Moreover, an anthropomorphic learning mechanism is proposed to improve the performance of emotion and situation detection. The mechanism based on out-of-vocabulary (OOV) detection is used to update emotion and situation database with new lexicon through interaction with user and internet. The experimental results show that the anthropomorphic learning mechanism increases the accuracy rate of situation and emotion detection by 30% and 20%, respectively.


international conference on orange technologies | 2013

Customizable cloud-healthcare dialogue system based on LVCSR with prosodic-contextual post-processing

Bo-Wei Chen; Po-Yi Shih; K. Bharanitharan; Po-Chuan Lin; Jhing-Fa Wang; Chia-Ming Chen

This work presents a customized cloud-healthcare dialogue system design based on large vocabulary continuous speech recognition (LVCSR) with prosodic-contextual post-processing. The customized cloud-healthcare dialogue system includes two parts. The first part is the cloud dialogue management and strategy, which manage and provide the services on demand. The second part is a web-based reminder and a customizable interface, which offer settings of reminding events and the customizable dialogue system. Moreover, for higher accuracy of speech recognition, this work proposes prosodic-contextual post-processing mechanism, which can find the best sentence from potential recognition results by using syllable segmentation, pitch analysis, and contextual analysis. In the experiment, five healthcare scenarios for the elderly are designed for evaluation. The analysis indicates that the average mean opinion score (MOS) can reach as high as 4.23. Additionally, the word error rate (WER) of LVCSR with the proposed prosodic-contextual post-processing is improved by 9.21%. Such results show that the proposed system is suitable for the elderly in daily living and demonstrates feasibility of our idea.


asia-pacific signal and information processing association annual summit and conference | 2013

Instrumental activities of daily living (IADL) evaluation system based on EEG signal feature analysis

Yang-Yen Ou; Po-Yi Shih; Po-Chuan Lin; Jhing-Fa Wang; Bo-Wei Chen; Sheng-Chung Chan

This work proposes an IADL evaluation system using LDA algorithm based on EEG signal, which explores the correlation between the subjective IADL assessment and the objective EEG signals measurement. Five features are extracted from the single channel EEG device including average amplitude, power ratio, spectral central, spectral edge frequency 25% and 50%. These features are represented as an indicator of participants IADL and are classified as IADL scales using LDA algorithm. For system evaluation, thirty elderly participants (70 ~ 96 years old) are classified into three groups by IADL score: high (disability-free, 16~24 points), medium (mild disability, 8 ~ 15 points) and low (severe disability, 0 ~ 7 points). These IADL groups distribute uniformly to conduct following IADL scenarios; 1. Ability to use telephone, 2. Ability to handle finances, and 3. Chat with people (that is not included in IADL scenario). The experiment result shows that the proposed EEG features and evaluation system can achieve 90% average accuracy rate verified by Leave-One-Out cross validation (LOOCV).


systems, man and cybernetics | 2011

Robust sound recognition applied to awareness for health/children/elderly care

Jhing-Fa Wang; Po-Yi Shih; Zhong-Hua Fu; Sheng-Chieh Lee

This paper presented a robust sound recognition work applied to awareness for health/children/elderly care. Specific sound awareness services can be activated based on recognized sound classes for detecting human activities as health care. To attain this goal, this study developed key technologies as follows: 1) SNR-aware subspace signal enhancement, 2) pitch and power density-based sound/speech discrimination, 3) HMM-based speech recognition, 4) sound recognition with ICA-transformed MFCCs feature and frame-based multiclass SVMs. Each classified sound event is response to human with predefined processes as sound awareness info. Simulations and an experiment are given to illustrate the performance of the proposed robust sound recognition system in a real-world home environment, Aspire Home, NCKU. The overall average resulting accuracy rate was approximately 90.97%.

Collaboration


Dive into the Po-Yi Shih's collaboration.

Top Co-Authors

Avatar

Jhing-Fa Wang

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Yuan-Ning Lin

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Ta-Wen Kuan

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Yang-Yen Ou

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Shao-Hsien Shih

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Zhong-Hua Fu

Northwestern Polytechnical University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bo-Hao Su

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Chia-Ming Chen

National Cheng Kung University

View shared research outputs
Researchain Logo
Decentralizing Knowledge