Huang-Chia Shih
Yuan Ze University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Huang-Chia Shih.
IEEE Transactions on Multimedia | 2006
Chung-Lin Huang; Huang-Chia Shih; Chung-Yuan Chao
Video semantic analysis is formulated based on the low-level image features and the high-level knowledge which is encoded in abstract, nongeometric representations. This paper introduces a semantic analysis system based on Bayesian network (BN) and dynamic Bayesian network (DBN). It is validated in the particular domain of soccer game videos. Based on BN/DBN, it can identify the special events in soccer games such as goal event, corner kick event, penalty kick event, and card event. The video analyzer extracts the low-level evidences, whereas the semantic analyzer uses BN/DBN to interpret the high-level semantics. Different from previous shot-based semantic analysis approaches, the proposed semantic analysis is frame-based for each input frame, it provides the current semantics of the event nodes as well as the hidden nodes. Another contribution is that the BN and DBN are automatically generated by the training process instead of determined by ad hoc. The last contribution is that we introduce a so-called temporal intervening network to improve the accuracy of the semantics output
IEEE Transactions on Broadcasting | 2005
Huang-Chia Shih; Chung-Lin Huang
The information processing of sports video yields valuable semantics for content delivery over narrowband networks. Traditional image/video processing is formulated in terms of low-level features describing image/video structure and intensity, while the high-level knowledge such as common sense and human perceptual knowledge are encoded in abstract and nongeometric representations. The management of semantic information in video becomes more and more difficult because of the large difference in representations, levels of knowledge, and abstract episodes. This paper proposes a semantic highlight detection scheme using a Multi-level Semantic Network (MSN) for baseball video interpretation. The probabilistic structure can be applied for highlight detection and shot classification. Satisfactory results will be shown to illustrate better performance compared with the traditional ones.
international conference on pattern recognition | 2006
Yao-Te Tsai; Huang-Chia Shih; Chung-Lin Huang
This paper introduces a multiple human objects tracking system to detect and track multiple objects in the crowded scene in which occlusions occur. Our method assign each pixel to different human object based on its relative distance to that object and the corresponding color model. If no occlusion, we easily track each object independently based on each segmented object region and optical flow. With occlusion, we analyze the color distribution of the occlusion group to differentiate each object in the group. By calculating the distances between objects, we can determine whether an object is separated from the occlusion group and to be tracked individually afterwards
international conference on acoustics, speech, and signal processing | 2005
Chung-Yuan Chao; Huang-Chia Shih; Chung-Lin Huang
This paper proposes a novel semantics-based content analysis system for reliable media highlight extraction using dynamic Bayesian network (DBN). It extracts the low-level evidences and then converts the input video to high-level semantic meaning. Specific domains contain rich spatial and temporal transitional structures that help the transformation process. We introduce a robust audio-visual low-level evidence extraction scheme, and develop the so-called temporal intervening network to improve the performance of our system. In experiments, we show that our system can detect soccer events such as goal event, corner kick event, penalty kick event, and card event effectively.
Pattern Recognition | 2013
Huang-Chia Shih
This study proposed a precise facial feature extraction method to improve the accuracy of gender classification under pose and illumination variations. We used the active appearance model (AAM) to align the face image. Images were modeled by the patches around the coordinates of certain landmarks. Using the proposed precise patch histogram (PPH) enabled us to improve the accuracy of the global facial features. The system is composed of three phases. In the training phase, non-parametric statistics were used to describe the characteristics of the training images and to construct the patch library. In the inference phase, the choice of feature patch from the library needed to approximate the patch of the testing image was based on the maximum a posteriori estimation. In the estimation phase, a Bayesian framework with portion-oriented posteriori fine-tuning was employed to determine the classification decision. In addition, we developed the dynamic weight adaptation to obtain a more convincing performance. The experimental results demonstrated the robustness of the proposed method. Highlights? This paper presents a precise patch histogram (PPH) to improve the performance. ? A patch-based feature acquisition with AAM algorithm is proposed. ? The system includes a library selection using eigenface and k-means clustering. ? The accuracy of the global facial features was evidently improved by using the PPH. ? A portion-oriented posteriori fine-tuning was used to improve the classification.
IEEE Transactions on Broadcasting | 2008
Huang-Chia Shih; Chung-Lin Huang
This paper illustrates how to interpret the superimposed caption box (SCB) in broadcasted sports videos of which the SCB template is presumably not given as a priori. The embedded captions in sports video programs represent digested key information of the video content. Most of the previous studies assume that the SCB template and the character bitmaps are known. The major contributions of this paper are (1) caption template extraction and identification, (2) symbol extraction and modeling, and (3) semantic interpretation of the identified captions and symbols. Experimental results show that the algorithm performs the SCB contents understanding for several commercial sports video programs.
IEEE Transactions on Broadcasting | 2013
Huang-Chia Shih
This paper presents a novel key-frame detection method that combines the visual saliency-based attention features with the contextual game status information for sports videos. Two critical issues of the attention-based video content analysis are addressed: 1) the visual attention characteristics when a user is watching a video clip and 2) extracting the degree of excitement about the on-going game status. First, the object-oriented visual attention map and the algorithm of determining the contextual attention are presented. The procedure of the contextual inference is used to simulate how the game status attracts the viewers. Second, a fusion methodology of visual and contextual attention analysis based on the characteristics of human excitement is introduced. In addition, the amount of key-frames is determined by using the contextual attention score, while the key-frame determination depends on integrating all the visual attention scores. In experimental results, it demonstrates the robustness of the proposed system for basketball and baseball programs.
international conference on acoustics, speech, and signal processing | 2003
Huang-Chia Shih; Chung-Lin Huang
The exploitation of semantic information in videos is difficult because of the large difference in representations, levels of knowledge and abstract episodes. Traditional image/video understanding and indexing is formulated in terms of low-level features describing image/video structure and intensity, while high-level knowledge such as common sense and human perceptual knowledge are encoded. This paper attempts to bridge this gap through the integration of image/video analysis algorithms with multi-level semantic network to interpret the baseball video.
Pattern Recognition | 2015
Huang-Chia Shih; Kuan-Chun Yu
This paper describes a robust template matching algorithm undergoing rotation-scaling-translation (RST) variations via our proposed SPiraL Aggregation Map (SPLAM), which is a novel image warping scheme. It not only provides an efficient method for generating the desired projection profiles for matching, it also enables us to determine the rotation angle, and is invariant to scale changes. Compared to other model-based methods, the proposed spiral projection model (SPM) provides the structural and statistical information about the template in a more general and easier to comprehend format. The SPM is a model-based texture-description scheme that enables the simultaneous representation for each value of projection profile. The profile, a set of parametric projection values functions by angular indexing, is the aggregate from a group of spiral sampling pixels. The experimental evaluation shows that the properties of the algorithm achieved very attractive results. This paper describes a robust and fast template matching algorithm.We introduced a novel image warping scheme called the SPiraL Aggregation Map (SPLAM).This descriptor is capable of expressing the template with high-order texture characteristics.
Information Sciences | 2016
Huang-Chia Shih; En-Rui Liu
This study concerns the image segmentation problem and the use of a color-alone feature for reducing the system complexity. On the basis of the color-based mathematical morphology method, the similarity measure between neighboring regions can be obtained as the solution of a ranking problem. To avoid the creation of a false color and false segmentation, a hybrid ordering approach was used instead of vectorized and marginal ordering approaches. Ordering methods that use black as the reference color to sort pixels face a problem: the scope of distance measurement is not optimal. To avoid this problem, we present a scheme for selecting a global reference color. Moreover, for determining orders of color vectors, the hue-saturation-intensity color distance was used instead of the Euclidean distance. The aforementioned scheme involves segmentation that is in accord with human visual perception. Quartile analysis indicated that threshold determination for region-merging showed less sensitivity to context variations of images. To evaluate the algorithm, it was experimentally compared with two typical segmentation schemes on the basis of four quantitative indices.