Fu-Rong Jean
National Taipei University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Fu-Rong Jean.
IEEE Transactions on Systems, Man, and Cybernetics | 2013
Lee-Min Lee; Fu-Rong Jean
The frame rate of the observation sequence in distributed speech recognition applications may be reduced to suit a resource-limited front-end device. In order to use models trained using full-frame-rate data in the recognition of reduced-frame-rate (RFR) data, we propose a method for adapting the transition probabilities of hidden Markov models (HMMs) to match the frame rate of the observation. Experiments on the recognition of clean and noisy connected digits are conducted to evaluate the proposed method. Experimental results show that the proposed method can effectively compensate for the frame-rate mismatch between the training and the test data. Using our adapted model to recognize the RFR speech data, one can significantly reduce the computation time and achieve the same level of accuracy as that of a method, which restores the frame rate using data interpolation.
international conference on acoustics, speech, and signal processing | 2013
Yu-Cheng Su; Yu Tsao; Jung-En Wu; Fu-Rong Jean
This paper proposes a generalized maximum a posteriori spectral amplitude (GMAPA) algorithm to spectral restoration for speech enhancement. The proposed GMAPA algorithm dynamically adjusts the scale of prior information to calculate the gain function for spectral restoration. In higher signal-to-noise ratio (SNR) conditions, GMAPA adopts a smaller scale to prevent overcompensations that may result in speech distortions. On the other hand, in lower SNR conditions, GMAPA uses a larger scale to enable the gain function to more effectively remove noise components from noisy speech. We also develop a mapping function to optimally determine the prior information scale according to the SNR of speech utterances. Two standardized speech databases, Aurora-4 and Aurora-2, are used to conduct objective and recognition evaluations, respectively, to test the proposed GMAPA algorithm. For comparison, three conventional spectral restoration algorithms are also evaluated; they are minimum mean-square error spectral estimator (MMSE), maximum likelihood spectral amplitude estimator (MLSA), and maximum a posteriori spectral amplitude estimator (MAPA). The experimental results first confirm that GMAPA provides better objective evaluation scores than MMSE, MLSA, and MAPA in lower SNR conditions, with comparable scores to MLSA in higher SNR conditions. Moreover, our recognition results indicate that GMAPA outperforms the three conventional algorithms consistently over different testing conditions.
biomedical and health informatics | 2014
Tan-Hsu Tan; Munkhjargal Gochoo; Ke-Hao Chen; Fu-Rong Jean; Yung-Fu Chen; Fu-Jin Shih; Chiung Fang Ho
An indoor activity monitoring system for the elderly is proposed in this paper by using a Fitbit Flex wristband (FFW) and an active RFID. Two methods have been presented for identification of an activity place and a best accuracy of 98.89% has been achieved. The activity level of the elderly is evaluated via dissimilarity measurement by employing an activity density map. The presented system has the advantages of avoiding invasion of ones privacy and monitoring the daily activity unobtrusively. Experimental results show the potential of the proposed system for practical application.
systems, man and cybernetics | 2014
Lee-Min Lee; Fu-Rong Jean; Tan-Hsu Tan; Jen-Hsiang Chou
In a client-server distributed speech recognition (DSR) application, speech features are extracted and quantized at the client-end, and are sent to a remote back-end server for recognition. Although the bandwidth constrains are mostly eliminated, data packets may be lost over error prone channels. In order to reduce the performance degradation because of frame missing, a frequently used error concealment approach is to restore a full frame rate (FFR) observation sequence for recognition at the back-end. In this paper, an alternative approach is proposed to deal with observations with lost frames. This approach at first extracts the most reliable reconstructed reduced-frame-rate (RFR) observation sequence from the received data at the back-end, and then decodes it with an adapted hidden Markov model (HMM) that compensates the mismatch between the FFR trained model and the RFR test data. Experimental results show that a DSR system using the proposed method can achieve the same level of accuracy as an FFR data reconstruction method and significantly lessens the computation time. From the viewpoint of user capacity of a DSR system, we find that the proposed method is capable of serving much more client users without any extra cost of installing new equipment.
systems, man and cybernetics | 2013
Tan-Hsu Tan; Cheng-Chun Chang; Fu-Rong Jean; John Y. Chiang; Yi-Chiao Lu
In orthogonal frequency division multiple access (OFDMA) system, the multiple access interference (MAI) is a critical factor that significantly degrades system performance. In this research, a genetic algorithm (GA) is employed for joint channel estimation and multi-user detection. For improving the weakness of GA in exploitation, an approach called simulated annealing mutated-GA (SAM-GA) is considered by embedding the simulated annealing (SA) to the mutation operation of GA. Experimental result demonstrates that the proposed SAM-GA scheme achieves the best performance in terms of mean squared error (MSE) and bit error rate (BER) for joint channel estimation and multi-user detection in the OFDMA system as compared to other existing schemes.
systems, man and cybernetics | 2014
Lee-Min Lee; Fu-Rong Jean
In distributed speech recognition applications, variable frame rate (VFR) analysis is a technique that can reduce the channel bandwidth and computation resources. In this method, slowly changing frames that provide little information are abandoned. Rapidly changing frames, on the other hand, that are more related to speech perception are preserved. In this paper, we proposed an analysis-by-synthesis (AbS) frame dropping algorithm together with a novel VFR decoding method for hidden Markov modeling of speech. A recursive formula for the calculation of forward probability function of the VFR observations was derived and was used to form a time-varying hidden Markov model (tvHMM) with transition probabilities that are depended on the time difference between successive observations. A generalized Viterbi decoding algorithm was developed to decode the VFR observations. We also use an example to explain the decoding process for a particular VFR observation sequence. Experiments were conducted to investigate the effectiveness of the proposed AbS-tvHMM method. The experimental results show that our method can achieve essentially the same accuracy as full frame rate observations at frame rate of only 40 % and significantly reduces the computation time.
systems, man and cybernetics | 2015
Lee-Min Lee; Fu-Rong Jean; Tan-Hsu Tan
We proposed an analysis-by-synthesis (AbS) frame dropping algorithm for the front end of a distributed speech recognition (DSR) system that preserves rapidly changing frames for being more related to speech perception but discards slowly changing frames for providing little information. When applying DSR over error prone packet-switched networks, speech data will inevitably suffer from frame loss since packets may be lost or delayed due to congestion at routers. We further employed a model adaptation error concealment decoder at the back-end for compensating the mismatch between the pre-trained models and the test data, which contain missing frames caused by frame dropping at the front end and packet loss over the transmitted channel. This approach, for convenience, is denoted as AbS-MA. In the decoding process of AbS-MA, the transition probabilities of the hidden Markov models are dynamically adapted according to the time difference between successive observations. Experiments on the recognition of Mandarin digits were conducted to investigate the effectiveness of the proposed AbSMA method for a wide range of combinations of frame rates and packet loss conditions. The performance of the proposed AbSMA approach was compared with a baseline approach, in which the error concealment was implemented by an interpolation as the estimate of the missing frame of the received observations at the back-end. The experimental results show that AbS-MA is not only superior to the baseline in word accuracy but also significantly reduces the computation time.
systems, man and cybernetics | 2014
Tan-Hsu Tan; Yung-Fa Huang; Fu-Rong Jean; Bo-Kai Chang; Sea-Fue Wang; John Y. Chiang
A new approach named MPSO-TS, which combines the advantages of particle swarm optimization (PSO) in global search, Tabu search (TS) technique in local search, and the mutation operation in improving solution diversity, is presented in this study. Experimental result obtained with ten benchmark functions illustrates that the proposed MPSO-TS is superior to other optimization schemes. Further, MPSO-TS is applied to estimate the carrier frequency offsets (CFOs) of the uplink orthogonal frequency division multiple access (OFDMA) system. Experimental results indicate that MPSO-TS not only can achieve the best performance but also spends the minimum CPU time per generation as compared to the existing PSO-based optimization approaches.
systems, man and cybernetics | 2006
Fu-Rong Jean; Bo-Nian Su
Most of low bit-rate speech coders based on the speech production model use line spectrum frequencies (LSFs) to represent short-term spectra of speech signals. A vector predictor for the LSFs which consists of a group of grey predictors is investigated in this paper for the purpose of estimating the current LSFs accurately by using previous LSFs. We impose a new parameter called fractional step (FS) on the grey predictor which is determined by the steepest descent method in achieving the optimal prediction performance. Furthermore, the vector predictor can easily be applied to a vector predictive coder for spectral quantization. The experimental results show that the direct scalar quantization and partitioned vector quantization for the LSFs need, in total, 34 bits/frame and 27 bits/frame, respectively to achieve the spectral distortion limen (DL) of 1 dB. The proposed vector predictor with scalar quantization scheme can maintain the same spectral distortion at only 24 bits/frame.
IEEE Access | 2017
Tan-Hsu Tan; Munkhjargal Gochoo; Fu-Rong Jean; Shih-Chia Huang; Sy-Yen Kuo