Shilin Wang
Shanghai Jiao Tong University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shilin Wang.
IEEE Transactions on Image Processing | 2004
Shu Hung Leung; Shilin Wang; Wing Hong Lau
Recently, lip image analysis has received much attention because its visual information is shown to provide improvement for speech recognition and speaker authentication. Lip image segmentation plays an important role in lip image analysis. In this paper, a new fuzzy clustering method for lip image segmentation is presented. This clustering method takes both the color information and the spatial distance into account while most of the current clustering methods only deal with the former. In this method, a new dissimilarity measure, which integrates the color dissimilarity and the spatial distance in terms of an elliptic shape function, is introduced. Because of the presence of the elliptic shape function, the new measure is able to differentiate the pixels having similar color information but located in different regions. A new iterative algorithm for the determination of the membership and centroid for each class is derived, which is shown to provide good differentiation between the lip region and the nonlip region. Experimental results show that the new algorithm yields better membership distribution and lip shape than the standard fuzzy c-means algorithm and four other methods investigated in the paper.
Pattern Recognition | 2007
Shilin Wang; Wing Hong Lau; Alan Wee-Chung Liew; Shu Hung Leung
Robust and accurate lip region segmentation is of vital importance for lip image analysis. However, most of the current techniques break down in the presence of mustaches and beards. With mustaches and beards, the background region becomes complex and inhomogeneous. We propose in this paper a novel multi-class, shape-guided FCM (MS-FCM) clustering algorithm to solve this problem. For this new approach, one cluster is set for the object, i.e. the lip region, and a combination of multiple clusters for the background which generally includes the skin region, lip shadow or beards. The proper number of background clusters is derived automatically which maximizes a cluster validity index. A spatial penalty term considering the spatial location information is introduced and incorporated into the objective function such that pixels having similar color but located in different regions can be differentiated. This facilitates the separation of lip and background pixels that otherwise are inseparable due to the similarity in color. Experimental results show that the proposed algorithm provides accurate lip-background partition even for the images with complex background features like mustaches and beards.
Pattern Recognition | 2004
Shilin Wang; Wing Hong Lau; Shu Hung Leung
Visual information from lip shapes and movements helps improve the accuracy and robustness of a speech recognition system. In this paper, a new region-based lip contour extraction algorithm that combines the merits of the point-based model and the parametric model is presented. Our algorithm uses a 16-point lip model to describe the lip contour. Given a robust probability map of the color lip image generated by the FCMS (fuzzy clustering method incorporating shape function) algorithm, a region-based cost function that maximizes the joint probability of the lip and non-lip region can be established. Then an iterative point-driven optimization procedure has been developed to fit the lip model to the probability map. In each iteration, the adjustment of the 16 lip points is governed by three pieces of quadratic curves that constrain the points to form a physical lip shape. Experiments show that the proposed approach provides satisfactory results for 5000 unadorned lip images of over 20 individuals. A real-time lip contour extraction system has also been implemented.
international workshop on digital watermarking | 2010
Xudong Zhao; Jianhua Li; Shenghong Li; Shilin Wang
Detecting splicing traces in the tampering color space is usually a tough work. However, it is found that image splicing which is difficult to be detected in one color space is probably much easier to be detected in another one. In this paper, an efficient approach for passive color image splicing detection is proposed. Chroma spaces are introduced in our work compared with commonly used RGB and luminance spaces. Four gray level run-length run-number (RLRN) vectors with different directions extracted from de-correlated chroma channels are employed as distinguishing features for image splicing detection. Support vector machine (SVM) is used as a classifier to demonstrate the performance of the proposed feature extraction method. Experimental results have shown that that RLRN features extracted from chroma channels provide much better performance than that extracted from R, G, B and luminance channels.
IEEE Transactions on Circuits and Systems for Video Technology | 2015
Xudong Zhao; Shilin Wang; Shenghong Li; Jianhua Li
In this paper, a 2-D noncausal Markov model is proposed for passive digital image-splicing detection. Different from the traditional Markov model, the proposed approach models an image as a 2-D noncausal signal and captures the underlying dependencies between the current node and its neighbors. The model parameters are treated as the discriminative features to differentiate the spliced images from the natural ones. We apply the model in the block discrete cosine transformation domain and the discrete Meyer wavelet transform domain, and the cross-domain features are treated as the final discriminative features for classification. The support vector machine which is the most popular classifier used in the image-splicing detection is exploited in our paper for classification. To evaluate the performance of the proposed method, all the experiments are conducted on public image-splicing detection evaluation data sets, and the experimental results have shown that the proposed approach outperforms some state-of-the-art methods.
Visual Speech Recognition: Lip Segmentation and Mapping | 2008
Alan Wee-Chung Liew; Shilin Wang
The unique research area of audio-visual speech recognition has attracted much interest in recent years as visual information about lip dynamics has been shown to improve the performance of automatic speech recognition systems, especially in noisy environments. Visual Speech Recognition: Lip Segmentation and Mapping presents an up-to-date account of research done in the areas of lip segmentation, visual speech recognition, and speaker identification and verification. A useful reference for researchers working in this field, this book contains the latest research results from renowned experts with in-depth discussion on topics such as visual speaker authentication, lip modeling, and systematic evaluation of lip features.
international symposium on circuits and systems | 2004
Shilin Wang; Wing Hong Lau; Shu Hung Leung; Hong Yan
Its well known that visual information such as lip shape and its movement can indicate what the speaker is talking about. In this paper, we present an automatic lipreading system solely using visual information for recognizing isolated English digits from 0 to 9. A parameter set of a 14-point ASM lip model is used to describe the outer lip contour. The inner mouth information such as the teeth region and the mouth opening are also extracted. With appropriate normalization, the feature vectors containing the normalized outer lip features, inner mouth features and also their first order derivatives are obtained for training the HMM word models. Experiments have been carried out to investigate the recognition performance using our visual feature set compared with other traditional visual feature representations. An accuracy of 93% for speaker dependent recognition and 84% for speaker independent recognition is achieved using our visual feature representation. A real-time automatic lipreading system has been successfully implemented on a 1.9-GHz PC.
international conference on image processing | 2007
Shilin Wang; Alan Wee-Chung Liew
For the image classification task, the color histogram is widely used as an important color feature indicating the content of the image. However, the high-resolution color histograms are usually of high dimension and contain much redundant information which does not relate to the image content, while the low-resolution histograms cannot provide adequate discriminative information for image classification. In this paper, a new color feature representation is proposed which not only takes the correlation among neighbouring components of the conventional color histogram into account but removes the redundant information as well. A high-resolution, uniform quantized color histogram is first obtained from the image. Then the redundant bins are removed and some neighbouring bins are combined together to generate a new feature component to maximize the discriminative ability. The mutual information is adopted to evaluate the discriminative power of a specific feature set and an iterative algorithm is performed to derive the histogram quantization and their corresponding feature generation. To illustrate the effectiveness of the proposed feature representation, an application of detecting adult images, i.e., image classification between erotic and benign images, is carried out. Two widely used classification techniques, SVM and Adaboost, are employed as the classifier. Experimental results show the superior performance of our color representation compared with the conventional color histogram in image classification.
Pattern Recognition | 2012
Shilin Wang; Alan Wee-Chung Liew
Compared with other traditional biometric features such as face, fingerprint, or handwriting, lip biometric features contain both physiological and behavioral information. Physiologically, different people have different lips. On the other hand, people can usually be differentiated by their talking style. Current research on lip biometrics generally does not distinguish between the two kinds of information during feature extraction and classification and the interesting question of whether the physiological or the behavioral lip features are more discriminative has not been comprehensively studied. In this paper, different physiological and behavioral lip features are studied with respect to their discriminative power in speaker identification and verification. Our experimental results have shown that both the static lip texture feature and the dynamic shape deformation feature can achieve high identification accuracy (above 90%) and low verification error rate (below 5%). In addition, the lip rotation and centroid deformations, which are related to the speakers talking mannerism, are found to be useful for speaker identification and verification. In contrast to previous studies, our results show that behavioral lip features are more discriminative in speaker identification and verification compared to physiological features.
IEEE Transactions on Circuits and Systems for Video Technology | 2008
Shilin Wang; Alan Wee-Chung Liew; Wai H. Lau; Shu Hung Leung
It is well known that visual cues of lip movement contain important speech relevant information. This paper presents an automatic lipreading system for small vocabulary speech recognition tasks. Using the lip segmentation and modeling techniques we developed earlier, we obtain a visual feature vector composed of outer and inner mouth features from the lip image sequence for recognition. A spline representation is employed to transform the discrete-time sampled features from the video frames into the continuous domain. The spline coefficients in the same word class are constrained to have similar expression and are estimated from the training data by the EM algorithm. For the multiple-speaker/speaker-independent recognition task, an adaptive multimodel approach is proposed to handle the variations caused by various talking styles. After building the appropriate word models from the spline coefficients, a maximum likelihood classification approach is taken for the recognition. Lip image sequences of English digits from 0 to 9 have been collected for the recognition test. Two widely used classification methods, HMM and RDA, have been adopted for comparison and the results demonstrate that the proposed algorithm deliver the best performance among these methods.