Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ming-Fang Weng is active.

Publication


Featured researches published by Ming-Fang Weng.


IEEE Transactions on Multimedia | 2008

Association and Temporal Rule Mining for Post-Filtering of Semantic Concept Detection in Video

Ken-Hao Liu; Ming-Fang Weng; Chi-Yao Tseng; Yung-Yu Chuang; Ming-Syan Chen

Automatic semantic concept detection in video is important for effective content-based video retrieval and mining and has gained great attention recently. In this paper, we propose a general post-filtering framework to enhance robustness and accuracy of semantic concept detection using association and temporal analysis for concept knowledge discovery. Co-occurrence of several semantic concepts could imply the presence of other concepts. We use association mining techniques to discover such inter-concept association relationships from annotations. With discovered concept association rules, we propose a strategy to combine associated concept classifiers to improve detection accuracy. In addition, because video is often visually smooth and semantically coherent, detection results from temporally adjacent shots could be used for the detection of the current shot. We propose temporal filter designs for inter-shot temporal dependency mining to further improve detection accuracy. Experiments on the TRECVID 2005 dataset show our post-filtering framework is both efficient and effective in improving the accuracy of semantic concept detection in video. Furthermore, it is easy to integrate our framework with existing classifiers to boost their performance.


acm multimedia | 2008

Multi-cue fusion for semantic video indexing

Ming-Fang Weng; Yung-Yu Chuang

The huge amount of videos currently available poses a difficult problem in semantic video retrieval. The success of query-by-concept, recently proposed to handle this problem, depends greatly on the accuracy of concept-based video indexing. This paper describes a multi-cue fusion approach toward improving the accuracy of semantic video indexing. This approach is based on a unified framework that explores and integrates both contextual correlation among concepts and temporal dependency among shots. The framework is novel in two ways. First, a recursive algorithm is proposed to learn both inter-concept and inter-shot relationships from ground-truth annotations of tens of thousands of shots for hundreds of concepts. Second, labels for all concepts and all shots are solved simultaneously through optimizing a graphical model. Experiments on the widely used TRECVID 2006 data set show that our framework is effective for semantic concept detection in video, achieving around a 30% performance boost on two popular benchmarks, VIREO-374 and Columbia374, in inferred average precision.


IEEE Transactions on Image Processing | 2015

Cross-Camera Knowledge Transfer for Multiview People Counting

Nick C. Tang; Yen-Yu Lin; Ming-Fang Weng; Hong-Yuan Mark Liao

We present a novel two-pass framework for counting the number of people in an environment, where multiple cameras provide different views of the subjects. By exploiting the complementary information captured by the cameras, we can transfer knowledge between the cameras to address the difficulties of people counting and improve the performance. The contribution of this paper is threefold. First, normalizing the perspective of visual features and estimating the size of a crowd are highly correlated tasks. Hence, we treat them as a joint learning problem. The derived counting model is scalable and it provides more accurate results than existing approaches. Second, we introduce an algorithm that matches groups of pedestrians in images captured by different cameras. The results provide a common domain for knowledge transfer, so we can work with multiple cameras without worrying about their differences. Third, the proposed counting system is comprised of a pair of collaborative regressors. The first one determines the people count based on features extracted from intracamera visual information, whereas the second calculates the residual by considering the conflicts between intercamera predictions. The two regressors are elegantly coupled and provide an accurate people counting system. The results of experiments in various settings show that, overall, our approach outperforms comparable baseline methods. The significant performance improvement demonstrates the effectiveness of our two-pass regression framework.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2012

Cross-Domain Multicue Fusion for Concept-Based Video Indexing

Ming-Fang Weng; Yung-Yu Chuang

The success of query-by-concept, proposed recently to cater to video retrieval needs, depends greatly on the accuracy of concept-based video indexing. Unfortunately, it remains a challenge to recognize the presence of concepts in a video segment or to extract an objective linguistic description from it because of the semantic gap, that is, the lack of correspondence between machine-extracted low-level features and human high-level conceptual interpretation. This paper studies three issues with the aim to reduce such a gap: 1) how to explore cues beyond low-level features, 2) how to combine diverse cues to improve performance, and 3) how to utilize the learned knowledge when applying it to a new domain. To solve these problems, we propose a framework that jointly exploits multiple cues across multiple video domains. First, recursive algorithms are proposed to learn both interconcept and intershot relationships from annotations. Second, all concept labels for all shots are simultaneously refined in a single fusion model. Additionally, unseen shots are assigned pseudolabels according to their initial prediction scores so that contextual and temporal relationships can be learned, thus requiring no additional human effort. Integration of cues embedded within training and testing video sets accommodates domain change. Experiments on popular benchmarks show that our framework is effective, achieving significant improvements over popular baselines.


ACM Transactions on Multimedia Computing, Communications, and Applications | 2012

Collaborative video reindexing via matrix factorization

Ming-Fang Weng; Yung-Yu Chuang

Concept-based video indexing generates a matrix of scores predicting the possibilities of concepts occurring in video shots. Based on the idea of collaborative filtering, this article presents unsupervised methods to refine the initial scores generated by concept classifiers by taking into account the concept-to-concept correlation and shot-to-shot similarity embedded within the score matrix. Given a noisy matrix, we refine the inaccurate scores via matrix factorization. This method is further improved by learning multiple local models and incorporating contextual-temporal structures. Experiments on the TRECVID 2006--2008 datasets demonstrate relative performance gains ranging from 13% to 52% without using any user annotations or external knowledge resources.


international workshop on information forensics and security | 2011

Cross camera people counting with perspective estimation and occlusion handling

Tsung-Yi Lin; Yen-Yu Lin; Ming-Fang Weng; Yu-Chiang Frank Wang; Yu-Feng Hsu; Hong-Yuan Mark Liao

We introduce a novel approach to cross camera people counting that can adapt itself to a new environment without the need of manual inspection. The proposed counting model is composed of a pair of collaborative Gaussian processes (GP), which are respectively designed to count people by taking the visible and occluded parts into account. While the first GP exploits multiple visual features to result in better accuracy, the second GP instead investigates the conflicts among these features to recover the underestimate caused by occlusions. Our contributions are threefold. First, we establish a cross camera people counting system that can facilitate forensics investigation and security preservation. Second, a principled way is proposed to estimate the degree of occlusions. Third, our system is comprehensively evaluated on two benchmark datasets. The promising performance demonstrates the effectiveness of our system.


IEEE Transactions on Image Processing | 2015

Robust Action Recognition via Borrowing Information Across Video Modalities

Nick C. Tang; Yen-Yu Lin; Ju-Hsuan Hua; Shih-En Wei; Ming-Fang Weng; Hong-Yuan Mark Liao

The recent advances in imaging devices have opened the opportunity of better solving the tasks of video content analysis and understanding. Next-generation cameras, such as the depth or binocular cameras, capture diverse information, and complement the conventional 2D RGB cameras. Thus, investigating the yielded multimodal videos generally facilitates the accomplishment of related applications. However, the limitations of the emerging cameras, such as short effective distances, expensive costs, or long response time, degrade their applicability, and currently make these devices not online accessible in practical use. In this paper, we provide an alternative scenario to address this problem, and illustrate it with the task of recognizing human actions. In particular, we aim at improving the accuracy of action recognition in RGB videos with the aid of one additional RGB-D camera. Since RGB-D cameras, such as Kinect, are typically not applicable in a surveillance system due to its short effective distance, we instead offline collect a database, in which not only the RGB videos but also the depth maps and the skeleton data of actions are available jointly. The proposed approach can adapt the interdatabase variations, and activate the borrowing of visual knowledge across different video modalities. Each action to be recognized in RGB representation is then augmented with the borrowed depth and skeleton features. Our approach is comprehensively evaluated on five benchmark data sets of action recognition. The promising results manifest that the borrowed information leads to remarkable boost in recognition accuracy.


IEEE Transactions on Multimedia | 2014

Example-Based Human Motion Extrapolation and Motion Repairing Using Contour Manifold

Nick C. Tang; Chiou-Ting Hsu; Ming-Fang Weng; Tsung-Yi Lin; Hong-Yuan Mark Liao

We propose a human motion extrapolation algorithm that synthesizes new motions of a human object in a still image from a given reference motion sequence. The algorithm is implemented in two major steps: contour manifold construction and object motion synthesis. Contour manifold construction searches for low-dimensional manifolds that represent the temporal-domain deformation of the reference motion sequence. Since the derived manifolds capture the motion information of the reference sequence, the representation is more robust to variations in shape and size. With this compact representation, we can easily modify and manipulate human motions through interpolation or extrapolation in the contour manifold space. In the object motion synthesis step, the proposed algorithm generates a sequence of new shapes of the input human object in the contour manifold space and then renders the textures of those shapes to synthesize a new motion sequence. We demonstrate the efficacy of the algorithm on different types of practical applications, namely, motion extrapolation and motion repair.


acm multimedia | 2008

Interactive content presentation based on expressed emotion and physiological feedback

Tien-Lin Wu; Hsuan-Kai Wang; Chien-Chang Ho; Yuan-Pin Lin; Ting-Ting Hu; Ming-Fang Weng; Liwei Chan; Chang-Hua Yang; Yi-Hsuan Yang; Yi-Ping Hung; Yung-Yu Chuang; Hsin-Hsi Chen; Homer H. Chen; Jyh-Horng Chen; Shyh-Kang Jeng

In this technical demonstration, we showcase an interactive content presentation (ICP) system that integrates media-expressed-emotion-based composition, user-perceived preference feedback, and interactive digital art creation. ICP harmonizes the browsing of multimedia contents by presenting them in the form of music videos (photos, blog articles with accompanied music) based on their expressed emotion similarity. ICP facilitates content browsing by automatically and dynamically selecting the media to be played next in real time, responding to users preference feedback measured from physiological signals. In addition, ICP enhances the enjoyments of content browsing by incorporating interactive digital art creation. ICP achieves these goals by properly integrating recent researches on media-expressed emotion classification,cross-media composition, and physiological signal processing.


acm multimedia | 2012

Visual knowledge transfer among multiple cameras for people counting with occlusion handling

Ming-Fang Weng; Yen-Yu Lin; Nick C. Tang; Hong-Yuan Mark Liao

We present a framework to count the number of people in an environment where multiple cameras with different angles of view are available. We consider the visual cues captured by each camera as a knowledge source, and carry out cross-camera knowledge transfer to alleviate the difficulties of people counting, such as partial occlusions, low-quality images, clutter backgrounds, and so on. Specifically, this work distinguishes itself with the following contributions. First, we overcome the variations of multiple heterogeneous cameras with different perspective settings by matching the same groups of pedestrians taken by these cameras, and present an algorithm for accomplishing cross-camera correspondence. Second, the proposed counting model is composed of a pair of collaborative regressors. While one regressor measures people counts by the features extracted from intra-camera visual evidences, the other recovers the yielded residual by taking the conflicts among inter-camera predictions into account. The two regressors are elegantly coupled, and jointly lead to an accurate counting system. Additionally, we provide a set of manually annotated pedestrian labels on the PETS 2010 videos for performance evaluation. Our approach is comprehensively tested in various settings and compared with competitive baselines. The significant improvement in performance manifests the effectiveness of the proposed approach.

Collaboration


Dive into the Ming-Fang Weng's collaboration.

Top Co-Authors

Avatar

Yung-Yu Chuang

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Shyh-Kang Jeng

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yu-Ting Hsieh

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar

Ju-Hsuan Hua

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Tsung-Yi Lin

University of California

View shared research outputs
Researchain Logo
Decentralizing Knowledge