Chien-Yao Wang
National Central University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chien-Yao Wang.
international conference on orange technologies | 2015
Chang-Di Huang; Chien-Yao Wang; Jia-Ching Wang
Because of the change of family structure and population ageing, elderly and children care is become a very important issue in modern society. When adults are busy working, they have no time to care elderly and children who standalone in the home. This paper proposes an elderly and children care system to solve this important problem. The proposed intelligent surveillance system is based on action recognition technique of image processing. In this paper, a three stream convolution neural network is proposed for recognize human actions such as fall floor and baby craw. If the system detect abnormal activities are occurred, it will raise alarm and notice family members. In the experiment, there are totally 21 categories activities are collected from HMDB-51 dataset, UCF-101 dataset and Internet. The proposed system achieves 93.42% recognition rate of selected actions.
IEEE Transactions on Audio, Speech, and Language Processing | 2018
Chien-Yao Wang; Jia-Ching Wang; Andri Santoso; Chin-Chin Chiang; Chung-Hsien Wu
Automatic sound event recognition (SER) has recently attracted renewed interest. Although practical SER system has many useful applications in everyday life, SER is challenging owing to the variations among sounds and noises in the real-world environment. This paper presents a novel feature extraction and classification method to solve the problem of SER. An audio–visual descriptor, called the auditory-receptive-field binary pattern, is designed based on the spectrogram image feature, the cepstral features, and the human auditory receptive field model. The extracted features are then fed into a classifier to perform event classification. The proposed classifier, called the hierarchical-diving deep belief network, is a deep neural network system that hierarchically learns the discriminative characteristics from physical feature representation to the abstract concept. The performance of our proposed system was verified using several experiments under various conditions. Using the RWCP dataset, the proposed system achieved a recognition rate of 99.27% for real-world sound data in 105 categories. Under noisy conditions, the developed system is very robust, with which it achieved 95.06% recognition rate with 0 dB signal-to-noise ratio. Using the TUT sound event dataset, the proposed system achieves error rates of 0.81 and 0.73 in sound event detection in home and residential area scenes. The experimental results reveal that the proposed system outperformed the other systems in this field.
acm multimedia | 2016
Yuan-Shan Lee; Chien-Yao Wang; Seksan Mathulaprangsan; Jia-Hao Zhao; Jia-Ching Wang
This paper concerns the development of locality-preserving methods for object recognition. The major purpose is consideration of both descriptor-level locality and image-level locality throughout the recognition process. Two dual-layer locality-preserving methods are developed, in which locality-constrained linear coding (LLC) is used to represent an image. In the learning phase, the discriminative locality-preserving K-SVD (DLP-KSVD) in which the label information is incorporated into the locality-preserving term is proposed. In addition to using class labels to learn a linear classifier, the label-consistent LP-KSVD (LCLP-KSVD) is proposed to enhance the discriminability of the learned dictionary. In LCLP-KSVD, the objective function includes a label-consistent term that penalizes sparse codes from different classes. For testing, additional information about the locality of query samples is obtained by treating the locality-preserving matrix as a feature. The recognition results that were obtained in experiments with the Caltech101 database indicate that the proposed method outperforms existing sparse coding based approaches.
international conference on consumer electronics | 2015
Chien-Yao Wang; Yu-Hao Chin; Tzu-Chiang Tai; David Gunawan; Jia-Ching Wang
This work proposes an automatic recognition system for recognizing audio events. First, an audio signal is converted into a spectrogram by short time Fourier transform. The acoustic background noises in the spectrogram are reduced by box filtering. The contrast of the spectrogram is then enhanced by VAR operation. With the enhanced spectrogram, this work further proposes a novel dynamic local binary pattern (DLBP) feature based on human auditory system. Finally, the DLBP features are fed to multi-class support vector machines to achieve the audio event recognition. The experimental results on 16 classes of audio events demonstrate the performance of the proposed audio event recognition system.
international carnahan conference on security technology | 2015
Aufaclav Zatu Kusuma Frisky; Chien-Yao Wang; Andri Santoso; Jia-Ching Wang
This paper proposes a system to address the problem of visual speech recognition. The proposed system is based on visual lip movement recognition by applying video content analysis technique. Using spatiotemporal features descriptors, we extracted features from video containing visual lip information. A preprocessing step is employed by removing the noise and enhancing the contrast of images in every frames of video. Extracted feature are used to build a dictionary for kernel sparse representation classifier (K-SRC) in the classification step. We adopted non-negative matrix factorization (NMF) method to reduce the dimensionality of the extracted features. We evaluated the performance of our system using AVLetters and AVLetters2 dataset. To evaluate the performance of our system, we used the same configuration as another previous works. Using AVLetters dataset, the promising accuracies of 67.13%, 45.37%, and 63.12% can be achieved in semi speaker dependent, speaker independent, and speaker dependent, respectively. Using AVLetters2 dataset, our method can achieve accuracy rate of 89.02% for speaker dependent case and 25.9% for speaker independent. This result showed that our proposed method outperforms another methods using same configuration.
Proceedings of the ASE BigData & SocialInformatics 2015 on | 2015
Ari Hernawan; Yuan-Shan Lee; Andri Santoso; Chien-Yao Wang; Jia-Ching Wang
This paper proposes a modified Bayesian Sensing Hidden Markov Model (BS-HMM) to address the problem of hand gestures recognition on few labeled data. In this work, BS-HMM is investigated based on its success to address the problem of large-vocabulary of continuous speech recognition. We introduced error modeling into BS-HMM basis vector to handle the noise that occurs in the data. We also introduced a forgetting factor to preserve important information from previous basis vector and to improve both convergence and representation ability of the BS-HMM basis vector. We modified Moving Pose method to extract the feature descriptor from hand gestures data. To evaluate the performance of our system, we compared our proposed method with previously proposed HMM methods. The experimental result showed the improvement of proposed method over others, even when only a small number of labeled data are available for training dataset.
international conference on multimedia and expo | 2017
Po-Jen Chen; Jian-Jiun Ding; Hung-Wei Hsu; Chien-Yao Wang; Jia-Ching Wang
Convolutional neural network (CNN) is more and more important in pattern recognition. In this work, we adopt label relations and long short-term memory (LSTM) to develop an accurate CNN-based scene classification algorithm. Traditional scene classification algorithms assume that labels are mutually exclusive. However, this is not reasonable when an image has a variety of objects and hence has multiple labels. In this work, we apply two label relations, which are exclusive and hierarchy relations, to improve the accuracy of multiple-label scene classification. For example, it is impossible that an image has both the labels of “factory” and “garden”. If the label “factory” is assigned to an image, the probability that it has the label of “garden” should be lowered. We also use image captioning to construct a scene classification model and propose an LSTM based method to further explore label relations and obtain more accurate results for scenic image labeling.
international conference on acoustics, speech, and signal processing | 2017
Chien-Yao Wang; Chin-Chin Chiang; Jian-Jiun Ding; Jia-Ching Wang
This paper proposes a dynamic tracking attention model (DTAM), which mainly comprises a motion attention mechanism, a convolutional neural network (CNN) and long short-term memory (LSTM), to recognize human action in a video sequence. In the motion attention mechanism, the local dynamic tracking is used to track moving objects in feature domain and global dynamic tracking corrects the motion in the spectral domain. The CNN is utilized to perform feature extraction, while the LSTM is applied to handle sequential information about actions that is extracted from videos. It effectively fetches information between consecutive frames in a video sequence and has an even higher recognition rate than does the CNN-LSTM. Combining the DTAM with the visual attention model, the proposed algorithm has a recognition rate that is 3.6% and 4.5% higher than that of the CNN-LSTMs with and without the visual attention model, respectively.
international conference on acoustics, speech, and signal processing | 2017
Chien-Yao Wang; Jyun-Hong Li; Seksan Mathulaprangsan; Chin-Chin Chiang; Jia-Ching Wang
Semantic image segmentation is now an exciting area of research owing to its various useful applications in daily life. This paper introduces a hierarchical joint-guided network (HJGN) which is mainly composed of proposed hierarchical joint learning convolutional networks (HJLCNs) and proposed joint-guided and making networks (JGMNs). HJLCNs exhibit high robustness in the segmentation of unseen objects that are not contained in training categories. JGMNs are very effective in filling holes and preventing incorrect segmentation predictions. The proposed HJGNs outperform the state-of-the-art methods on the PASCAL VOC 2012 testing set, reaching a mean IU of 80.4%.
asia pacific signal and information processing association annual summit and conference | 2016
Chien-Yao Wang; Seksan Mathulaprangsan; Bo-Wei Chen; Yu-Hao Chin; Jing-Jia Shiu; Yu-San Lin; Jia-Ching Wang
This research proposes a novel Bayesian sparse representation (BSR) method along with extracting facial parameters of SIFT to create sparse dictionaries, which are invariant to rotation, scale, and shift. By using K-means and information theory, a new dictionary called extended dictionary is developed. Compared with conventional orthogonal matching pursuit (OMP) algorithm, the proposed system that utilized Bayesian method to model the optimization problem of sparse representation can reduce the uncertainty of observed signals and expand the modeling ability of dictionaries by using variance. The experimental results show that the proposed extended dictionary can enhance the sparsity. Furthermore, it can improve accuracy rates of face identification and residues of reconstruction.