Payton Lin
Center for Information Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Payton Lin.
Journal of the Acoustical Society of America | 2011
Payton Lin; Christopher W. Turner; Bruce J. Gantz; Hamid R. Djalilian; Fan-Gang Zeng
Residual acoustic hearing can be preserved in the same ear following cochlear implantation with minimally traumatic surgical techniques and short-electrode arrays. The combined electric-acoustic stimulation significantly improves cochlear implant performance, particularly speech recognition in noise. The present study measures simultaneous masking by electric pulses on acoustic pure tones, or vice versa, to investigate electric-acoustic interactions and their underlying psychophysical mechanisms. Six subjects, with acoustic hearing preserved at low frequencies in their implanted ear, participated in the study. One subject had a fully inserted 24 mm Nucleus Freedom array and five subjects had Iowa/Nucleus hybrid implants that were only 10 mm in length. Electric masking data of the long-electrode subject showed that stimulation from the most apical electrodes produced threshold elevations over 10 dB for 500, 625, and 750 Hz probe tones, but no elevation for 125 and 250 Hz tones. On the contrary, electric stimulation did not produce any electric masking in the short-electrode subjects. In the acoustic masking experiment, 125-750 Hz pure tones were used to acoustically mask electric stimulation. The acoustic masking results showed that, independent of pure tone frequency, both long- and short-electrode subjects showed threshold elevations at apical and basal electrodes. The present results can be interpreted in terms of underlying physiological mechanisms related to either place-dependent peripheral masking or place-independent central masking.
Journal of the Acoustical Society of America | 2013
Payton Lin; Thomas Lu; Fan-Gang Zeng
Across bilateral cochlear implants, contralateral threshold shift has been investigated as a function of electrode difference between the masking and probe electrodes. For contralateral electric masking, maximum threshold elevations occurred when the position of the masker and probe electrode was approximately place-matched across ears. The amount of masking diminished with increasing masker-probe electrode separation. Place-dependent masking occurred in both sequentially implanted ears, and was not affected by the masker intensity or the time delay from the masker onset. When compared to previous contralateral masking results in normal hearing, the similarities between place-dependent central masking patterns suggest comparable mechanisms of overlapping excitation in the central auditory nervous system.
Scientific Reports | 2017
Juan Huang; Benjamin Sheffield; Payton Lin; Fan-Gang Zeng
For cochlear implant users, combined electro-acoustic stimulation (EAS) significantly improves the performance. However, there are many more users who do not have any functional residual acoustic hearing at low frequencies. Because tactile sensation also operates in the same low frequencies (<500 Hz) as the acoustic hearing in EAS, we propose electro-tactile stimulation (ETS) to improve cochlear implant performance. In ten cochlear implant users, a tactile aid was applied to the index finger that converted voice fundamental frequency into tactile vibrations. Speech recognition in noise was compared for cochlear implants alone and for the bimodal ETS condition. On average, ETS improved speech reception thresholds by 2.2 dB over cochlear implants alone. Nine of the ten subjects showed a positive ETS effect ranging from 0.3 to 7.0 dB, which was similar to the amount of the previously-reported EAS benefit. The comparable results indicate similar neural mechanisms that underlie both the ETS and EAS effects. The positive results suggest that the complementary auditory and tactile modes also be used to enhance performance for normal hearing listeners and automatic speech recognition for machines.
Entropy | 2016
Payton Lin; Szu-Wei Fu; Syu-Siang Wang; Ying-Hui Lai; Yu Tsao
Conventionally, the maximum likelihood (ML) criterion is applied to train a deep belief network (DBN). We present a maximum entropy (ME) learning algorithm for DBNs, designed specifically to handle limited training data. Maximizing only the entropy of parameters in the DBN allows more effective generalization capability, less bias towards data distributions, and robustness to over-fitting compared to ML learning. Results of text classification and object recognition tasks demonstrate ME-trained DBN outperforms ML-trained DBN when training data is limited.
international conference on consumer electronics | 2015
Payton Lin; Syu-Siang Wang; Yu Tsao
Traditionally, only five components are regarded as having to do with the special characteristics of recognizing tones, while front-end processing and feature extraction have been considered essentially independent. Since mismatch between training and testing in signal-space leads to subsequent distortions in feature-space and model-space, determining whether front-end processing and feature extraction is independent or dependent will be critical for robustness.
Speech Communication | 2015
Yu Tsao; Payton Lin; Ting-yao Hu; Xugang Lu
ESSEM allowed unsupervised model adaptation with only one testing utterance.ATG mapping function enhances ESSEM framework with sufficient adaptation statistics.Adaptation performance determined by appropriate selection of ensemble models.Over-fitting issues handled with optimization processes: MAP, MS, and CS.Imposing constraints on ATG allows flexible extension to LR, BC, LCB, LC, and BF. The ensemble speaker and speaking environment modeling (ESSEM) framework was designed to provide online optimization for enhancing workable systems under real-world conditions. In the ESSEM framework, ensemble models are built in the offline phase to characterize specific environments based on local statistics prepared from those particular conditions. In the online phase, a mapping function is computed based on the incoming testing data to perform model adaptation. Previous studies utilized linear combination (LC) and linear combination with a correction bias (LCB) as simple mapping functions that only apply one weighting coefficient on each model. In order to better utilize the ensemble models, this study presents a generalized affine transform group (ATG) mapping function for the ESSEM framework. Although ATG characterizes unknown testing conditions more precisely using a larger amount of parameters, over-fitting issues occur when the available adaptation data is especially limited. This study handles over-fitting issues with three optimization processes: maximum a posteriori (MAP) criterion, model selection (MS), and cohort selection (CS). Experimental results showed that ATG along with the three optimization processes enabled the ESSEM framework to allow unsupervised model adaptation using only one utterance to provide consistent performance improvements.
international symposium on chinese spoken language processing | 2014
Syu-Siang Wang; Payton Lin; Dau-Cheng Lyu; Yu Tsao; Hsin-Te Hwang; Borching Su
This study proposes a polynomial based feature transferring (PFT) algorithm for acoustic feature conversion. The PFT process consists of estimation and conversion phases. The estimation phase aims to compute a polynomial based transfer function using only a small set of parallel source and target features. With the estimated transfer function, the conversion phase converts large sets of source features to target ones. This study evaluates the proposed PFT algorithm using a robust automatic speech recognition (ASR) task on the Aurora-2 database. The source features were MFCCs with cepstral mean and variance normalization (CMVN), and the target features were advanced front end features (AFE). Compared to CMVN, AFE provides better robust speech recognition performance but requires more complicated and expensive cost for feature extraction. By PFT, we intend to use a simple transfer function to obtain AFE-like acoustic features from the source CMVN features. Experimental results on Aurora-2 demonstrate that the PFT generated AFE-like features that can notably improve the CMVN performance and approach results achieved by AFE. Furthermore, the recognition accuracy of PFT was better than that of histogram equalization (HEQ) and polynomial based histogram equalization (PHEQ). The results confirm the effectiveness of PFT with just a few sets of parallel features.
asia pacific signal and information processing association annual summit and conference | 2014
Yun-Fan Chang; Payton Lin; Shao-Hua Cheng; Kai-Hsuan Chan; Yi-Chong Zeng; Chia-Wei Liao; Wen-Tsung Chang; Yu-Chiang Wang; Yu Tsao
Anchorperson segment detection enables efficient video content indexing for information retrieval. Anchorperson detection based on audio analysis has gained popularity due to lower computational complexity and satisfactory performance. This paper presents a robust framework using a hybrid I-vector and deep neural network (DNN) system to perform anchorperson detection based on audio streams of video content. The proposed system first applies I-vector to extract speaker identity features from the audio data. With the extracted speaker identity features, a DNN classifier is then used to verify the claimed anchorperson identity. In addition, subspace feature normalization (SFN) is incorporated into the hybrid system for robust feature extraction to compensate the audio mismatch issues caused by recording devices. An anchorperson verification experiment was conducted to evaluate the equal error rate (EER) of the proposed hybrid system. Experimental results demonstrate that the proposed system outperforms the state-of-the-art hybrid I-vector and support vector machine (SVM) system. Moreover, the proposed system was further enhanced by integrating SFN to effectively compensate the audio mismatch issues in anchorperson detection tasks.
IEEE Transactions on Audio, Speech, and Language Processing | 2017
Syu-Siang Wang; Payton Lin; Yu Tsao; Jeih-weih Hung; Borching Su
Distributed speech recognition (DSR) splits the processing of data between a mobile device and a network server. In the front-end, features are extracted and compressed to transmit over a wireless channel to a back-end server, where the incoming stream is received and reconstructed for recognition tasks. In this paper, we propose a feature compression algorithm termed suppression by selecting wavelets (SSW) to achieve the two main goals of DSR: Minimizing memory and device requirements while also maintaining or even improving the recognition performance. The SSW approach first applies the discrete wavelet transform (DWT) to filter the incoming speech feature sequence into two temporal subsequences at the client terminal. Feature compression is achieved by keeping the low (modulation) frequency subsequence while discarding the high frequency counterpart. The low-frequency subsequence is then transmitted across the remote network for specific feature statistics normalization. Wavelets are favorable for resolving the temporal properties of the feature sequence, and the down-sampling process in DWT achieves data compression by reducing the amount of data at the terminal prior to transmission across the network. Once the compressed features have arrived at the server, the feature sequence can be enhanced by statistics normalization, reconstructed with inverse DWT, and compensated with a simple post filter to alleviate any over-smoothing effects from the compression stage. Results on a standard robustness task (Aurora-4) and on a Mandarin Chinese news corpus showed SSW outperforms conventional noise-robustness techniques while also providing nearly a 50% compression rate during the transmission stage of DSR systems.
ieee global conference on signal and information processing | 2015
Payton Lin; Dau-Cheng Lyu; Yun-Fan Chang; Yu Tsao
Alternative features were derived from extracted temporal envelope bank (TBANK). These simplified temporal representations were investigated in alignment procedures to generate frame-level training labels for deep neural networks (DNNs). TBANK features improved temporal alignments both for supervised training and for context dependent tree building.
Collaboration
Dive into the Payton Lin's collaboration.
National Institute of Information and Communications Technology
View shared research outputs