Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ming-yu Chen is active.

Publication


Featured researches published by Ming-yu Chen.


computer vision and pattern recognition | 2009

Action recognition via local descriptors and holistic features

Xinghua Sun; Ming-yu Chen; Alexander G. Hauptmann

In this paper we propose a unified action recognition framework fusing local descriptors and holistic features. The motivation is that the local descriptors and holistic features emphasize different aspects of actions and are suitable for the different types of action databases. The proposed unified framework is based on frame differencing, bag-of-words and feature fusion. We extract two kinds of local descriptors, i.e. 2D and 3D SIFT feature descriptors, both based on 2D SIFT interest points. We apply Zernike moments to extract two kinds of holistic features, one is based on single frames and the other is based on motion energy image. We perform action recognition experiments on the KTH and Weizmann databases, using Support Vector Machines. We apply the leave-one-out and pseudo leave-N-out setups, and compare our proposed approach with state-of-the-art results. Experiments show that our proposed approach is effective. Compared with other approaches our approach is more robust, more versatile, easier to compute and simpler to understand.


acm multimedia | 2006

Extreme video retrieval: joint maximization of human and computer performance

Alexander G. Hauptmann; Wei-Hao Lin; Rong Yan; Jun Yang; Ming-yu Chen

We present an efficient system for video search that maximizes the use of human bandwidth, while at the same time exploiting the machines ability to learn in real-time from user selected relevant video clips. The system exploits the human capability for rapidly scanning imagery augmenting it with an active learning loop, which attempts to always present the most relevant material based on the current information. Two versions of the human interface were evaluated, one with variable page sizes and manual paging, the other with a fixed page size and automatic paging. Both require absolute attention and focus of the user for optimal performance. In either case, as users search and find relevant results, the system can invisibly re-rank its previous best guesses using a number of knowledge sources, such as image similarity, text similarity, and temporal proximity. Experimental evidence shows a significant improvement using the combined extremes of human and machine power over either approach alone.


international conference on multimedia and expo | 2004

Comparison and combination of two novel commercial detection methods

Pinar Duygulu; Ming-yu Chen; Alexander G. Hauptmann

Detection and removal of commercials plays an important role when searching for important broadcast news video material. Two novel approaches are proposed based on two distinctive characteristics of commercials, namely, repetitive use of commercials over time and distinctive color and audio features. Furthermore, proposed strategies for combining the results of the two methods yield even better performance. Experiments show over 90% recall and precision on a test set of 5 hours of ABC and CNN broadcast news data


conference on image and video retrieval | 2004

Finding Person X: Correlating Names with Visual Appearances

Jun Yang; Ming-yu Chen; Alexander G. Hauptmann

People as news subjects carry rich semantics in broadcast news video and therefore finding a named person in the video is a major challenge for video retrieval. This task can be achieved by exploiting the multi-modal information in videos, including transcript, video structure, and visual features. We propose a comprehensive approach for finding specific persons in broadcast news videos by exploring various clues such as names occurred in the transcript, face information, anchor scenes, and most importantly, the timing pattern between names and people. Experiments on the TRECVID 2003 dataset show that our approach achieves high performance.


HBU'10 Proceedings of the First international conference on Human behavior understanding | 2010

Comparing evaluation protocols on the KTH dataset

Zan Gao; Ming-yu Chen; Alexander G. Hauptmann; Anni Cai

Human action recognition has become a hot research topic, and a lot of algorithms have been proposed. Most of researchers evaluated their performances on the KTH dataset, but there is no unified standard how to evaluate algorithms on this dataset. Different researchers have employed different test setups, so the comparison is not accurate, fair or complete. In order to know how much difference there is when different experimental setups are used, we take our own spatio-temporal MoSIFT feature as an example to assess its performance on the KTH dataset using different test scenarios and different partitioning of the data. In all experiments, support vector machine (SVM) with a chi-square kernel is adopted. First, we evaluate performance changes resulting from differing vocabulary sizes of the codebook, and then decide on a suitable vocabulary size of codebook. Then, we train the models using different training dataset partitions, and test the performances one the corresponding held-out test sets. Experiments show that the best performance of MoSIFT can reach 96.33% on the KTH dataset. When different n-fold cross-validation methods are used, there can be up to 10.67% difference in the result. And when different dataset segmentations are used (such as KTH1 and KTH2), the difference in results can be up to 5.8% absolute. In addition, the performance changes dramatically when different scenarios are used in the training and test dataset. When training on KTH1 S1+S2+S3+S4 and testing on KTH1 S1 and S3 scenarios, the performance can reach 97.33% and 89.33% respectively. This paper shows how different test configurations can skew results, even on standard data set. The recommendation is to use a simple leave-one-out as the most easily replicable clear-cut partitioning.


multimedia information retrieval | 2010

Controlling your TV with gestures

Ming-yu Chen; Lily B. Mummert; Padmanabhan Pillai; Alexander G. Hauptmann; Rahul Sukthankar

Vision-based user interfaces enable natural interaction modalities such as gestures. Such interfaces require computationally intensive video processing at low latency. We demonstrate an application that recognizes gestures to control TV operations. Accurate recognition is achieved by using a new descriptor called MoSIFT, which explicitly encodes optical flow with appearance features. MoSIFT is computationally expensive - a sequential implementation runs 100 times slower than real time. To reduce latency sufficiently for interaction, the application is implemented on a runtime system that exploits the parallelism inherent in video understanding applications.


acm multimedia | 2005

Putting active learning into multimedia applications: dynamic definition and refinement of concept classifiers

Ming-yu Chen; Michael G. Christel; Alexander G. Hauptmann; Howard D. Wactlar

The authors developed an extensible system for video exploitation that puts the user in control to better accommodate novel situations and source material. Visually dense displays of thumbnail imagery in storyboard views are used for shot-based video exploration and retrieval. The user can identify a need for a class of audiovisual detection, adeptly and fluently supply training material for that class, and iteratively evaluate and improve the resulting automatic classification produced via multiple modality active learning and SVM. By iteratively reviewing the output of the classifier and updating the positive and negative training samples with less effort than typical for relevance feedback systems, the user can play an active role in directing the classification process while still needing to truth only a very small percentage of the multimedia data set. Examples are given illustrating the iterative creation of a classifier for a concept of interest to be included in subsequent investigations, and for a concept typically deemed irrelevant to be weeded out in follow-up queries. Filtering and browsing tools making use of existing and iteratively added concepts put the user further in control of the multimedia browsing and retrieval process.


Multimedia Tools and Applications | 2014

Enhanced and hierarchical structure algorithm for data imbalance problem in semantic extraction under massive video dataset

Zan Gao; Longfei Zhang; Ming-yu Chen; Alexander G. Hauptmann; Hua Zhang; Anni Cai

Data imbalance problem often exists in our real life dataset, especial for massive video dataset, however, the balanced data distribution and the same misclassification cost are assumed in traditional machine learning algorithms, thus, it will be difficult for them to accurately describe the true data distribution, and resulting in misclassification. In this paper, the data imbalance problem in semantic extraction under massive video dataset is exploited, and enhanced and hierarchical structure (called EHS) algorithm is proposed. In proposed algorithm, data sampling, filtering and model training are considered and integrated together compactly via hierarchical structure algorithm, thus, the performance of model can be improved step by step, and is robust and stability with the change of features and datasets. Experiments on TRECVID2010 Semantic Indexing demonstrate that our proposed algorithm has much more powerful performance than that of traditional machine learning algorithms, and keeps stable and robust when different kinds of features are employed. Extended experiments on TRECVID2010 Surveillance Event Detection also prove that our EHS algorithm is efficient and effective, and reaches top performance in four of seven events.


Proceedings of the first annual ACM SIGMM conference on Multimedia systems | 2010

Exploiting multi-level parallelism for low-latency activity recognition in streaming video

Ming-yu Chen; Lily B. Mummert; Padmanabhan Pillai; Alexander G. Hauptmann; Rahul Sukthankar

Video understanding is a computationally challenging task that is critical not only for traditionally throughput-oriented applications such as search but also latency-sensitive interactive applications such as surveillance, gaming, videoconferencing, and vision-based user interfaces. Enabling these types of video processing applications will require not only new algorithms and techniques, but new runtime systems that optimize latency as well as throughput. In this paper, we present a runtime system called Sprout that achieves low latency by exploiting the parallelism inherent in video understanding applications. We demonstrate the utility of our system on an activity recognition application that employs a robust new descriptor called MoSIFT, which explicitly augments appearance features with motion information. MoSIFT outperforms previous recognition techniques, but like other state-of-the-art techniques, it is computationally expensive -- a sequential implementation runs 100 times slower than real time. We describe the implementation of the activity recognition application on Sprout, and show that it can accurately recognize activities at full frame rate (25 fps) and low latency on a challenging airport surveillance video corpus.


international conference on acoustics, speech, and signal processing | 2004

Searching for a specific person in broadcast news video

Ming-yu Chen; Alexander G. Hauptmann

People as news subjects play an important role in broadcast news and finding a specific person is a major challenge for multimedia retrieval. Beyond mere content-based general retrieval, this task requires exploitation of the structure, time sequence and meaning of news content. We introduce a comprehensive approach to discovering clues for finding a specific person in broadcast news video. Various information aspects are investigated, including text information, timing information, scene detection, and face recognition. Experimental results on the TREC 2003 video search task show that our approach can achieve surprisingly high performance by exploiting broadly diverse information to find specific named people.

Collaboration


Dive into the Ming-yu Chen's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Howard D. Wactlar

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Wei-Hao Lin

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Zan Gao

Beijing University of Posts and Telecommunications

View shared research outputs
Top Co-Authors

Avatar

Jun Yang

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Anni Cai

Beijing University of Posts and Telecommunications

View shared research outputs
Top Co-Authors

Avatar

Ashok Bharucha

University of Pittsburgh

View shared research outputs
Top Co-Authors

Avatar

Robert V. Baron

Carnegie Mellon University

View shared research outputs
Researchain Logo
Decentralizing Knowledge