Alexander G. Hauptmann
Carnegie Mellon University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alexander G. Hauptmann.
IEEE MultiMedia | 2006
Milind R. Naphade; John R. Smith; Jelena Tesic; Shih-Fu Chang; Winston H. Hsu; Lyndon Kennedy; Alexander G. Hauptmann; Jon Curtis
As increasingly powerful techniques emerge for machine tagging multimedia content, it becomes ever more important to standardize the underlying vocabularies. Doing so provides interoperability and lets the multimedia community focus ongoing research on a well-defined set of semantics. This paper describes a collaborative effort of multimedia researchers, library scientists, and end users to develop a large standardized taxonomy for describing broadcast news video. The large-scale concept ontology for multimedia (LSCOM) is the first of its kind designed to simultaneously optimize utility to facilitate end-user access, cover a large semantic space, make automated extraction feasible, and increase observability in diverse broadcast news video data sets
acm multimedia | 2007
Xiao Wu; Alexander G. Hauptmann; Chong-Wah Ngo
Current web video search results rely exclusively on text keywords or user-supplied tags. A search on typical popular video often returns many duplicate and near-duplicate videos in the top results. This paper outlines ways to cluster and filter out the near-duplicate video using a hierarchical approach. Initial triage is performed using fast signatures derived from color histograms. Only when a video cannot be clearly classified as novel or near-duplicate using global signatures, we apply a more expensive local feature based near-duplicate detection which provides very accurate duplicate analysis through more costly computation. The results of 24 queries in a data set of 12,790 videos retrieved from Google, Yahoo! and YouTube show that this hierarchical approach can dramatically reduce redundant video displayed to the user in the top result set, at relatively small computational cost.
computer vision and pattern recognition | 2015
Zhongwen Xu; Yi Yang; Alexander G. Hauptmann
In this paper, we propose a discriminative video representation for event detection over a large scale video dataset when only limited hardware resources are available. The focus of this paper is to effectively leverage deep Convolutional Neural Networks (CNNs) to advance event detection, where only frame level static descriptors can be extracted by the existing CNN toolkits. This paper makes two contributions to the inference of CNN video representation. First, while average pooling and max pooling have long been the standard approaches to aggregating frame level static features, we show that performance can be significantly improved by taking advantage of an appropriate encoding method. Second, we propose using a set of latent concept descriptors as the frame descriptor, which enriches visual information while keeping it computationally affordable. The integration of the two contributions results in a new state-of-the-art performance in event detection over the largest video datasets. Compared to improved Dense Trajectories, which has been recognized as the best video representation for event detection, our new representation improves the Mean Average Precision (mAP) from 27.6% to 36.8% for the TRECVID MEDTest 14 dataset and from 34.0% to 44.6% for the TRECVID MEDTest 13 dataset.
IEEE Transactions on Multimedia | 2010
Yu-Gang Jiang; Jun Yang; Chong-Wah Ngo; Alexander G. Hauptmann
Based on the local keypoints extracted as salient image patches, an image can be described as a ¿bag-of-visual-words (BoW)¿ and this representation has appeared promising for object and scene classification. The performance of BoW features in semantic concept detection for large-scale multimedia databases is subject to various representation choices. In this paper, we conduct a comprehensive study on the representation choices of BoW, including vocabulary size, weighting scheme, stop word removal, feature selection, spatial information, and visual bi-gram. We offer practical insights in how to optimize the performance of BoW by choosing appropriate representation choices. For the weighting scheme, we elaborate a soft-weighting method to assess the significance of a visual word to an image. We experimentally show that the soft-weighting outperforms other popular weighting schemes such as TF-IDF with a large margin. Our extensive experiments on TRECVID data sets also indicate that BoW feature alone, with appropriate representation choices, already produces highly competitive concept detection performance. Based on our empirical findings, we further apply our method to detect a large set of 374 semantic concepts. The detectors, as well as the features and detection scores on several recent benchmark data sets, are released to the multimedia community.
conference on computational natural language learning | 2006
Wei-Hao Lin; Theresa Wilson; Janyce Wiebe; Alexander G. Hauptmann
In this paper we investigate a new problem of identifying the perspective from which a document is written. By perspective we mean a point of view, for example, from the perspective of Democrats or Republicans. Can computers learn to identify the perspective of a document? Not every sentence is written strongly from a perspective. Can computers learn to identify which sentences strongly convey a particular perspective? We develop statistical models to capture how perspectives are expressed at the document and sentence levels, and evaluate the proposed models on articles about the Israeli-Palestinian conflict. The results show that the proposed models successfully learn how perspectives are reflected in word usage and can identify the perspective of a document with high accuracy.
conference on image and video retrieval | 2003
Rong Yan; Alexander G. Hauptmann; Rong Jin
We present an algorithm for video retrieval that fuses the decisions of multiple retrieval agents in both text and image modalities. While the normalization and combination of evidence is novel, this paper emphasizes the successful use of negative pseudorelevance feedback to improve image retrieval performance. Although we have not solved all problems in video information retrieval, the results are encouraging, indicating that pseudo-relevance feedback shows great promise for multimedia retrieval with very varied and errorful data.
IEEE Computer | 1999
Howard D. Wactlar; Michael G. Christel; Yihong Gong; Alexander G. Hauptmann
The Informedia Project at Carnegie Mellon University has created a terabyte digital video library in which automatically derived descriptors for the video are used for indexing, segmenting and accessing the library contents. Begun in 1994, the project presented numerous challenges for library creation and deployment, valuable information covered in this article. The authors, developers of the project at Carnegie Mellon University, addressed these challenges by: automatically extracting information from digitized video; creating interfaces that allowed users to search for and retrieve videos based on extracted information; and validating the system through user testbeds. Through speech, image, and natural language processing, the Informedia Project has demonstrated that previously inaccessible data can be derived automatically and used to describe and index video segments.
human factors in computing systems | 1989
Alexander G. Hauptmann
An experiment was conducted with people using gestures and speech to manipulate graphic images on a computer screen. A human was substituted for the recognition devices. The analysis showed that people strongly prefer to use both gestures and speech for the graphics manipulation and that they intuitively use multiple hands and multiple fingers in all three dimensions. There was surprising uniformity and simplicity in the gestures and speech. The analysis of these results provides strong encouragement for future development of integrated multi-modal interaction systems.
Communications of The ACM | 1989
Sheryl R. Young; Alexander G. Hauptmann; Wayne H. Ward; D. Edward T. Smith; Philip Werner
The authors detail an integrated system which combines natural language processing with speech understanding in the context of a problem solving dialogue. The MINDS system uses a variety of pragmatic knowledge sources to dynamically generate expectations of what a user is likely to say.
computer vision and pattern recognition | 2009
Xinghua Sun; Ming-yu Chen; Alexander G. Hauptmann
In this paper we propose a unified action recognition framework fusing local descriptors and holistic features. The motivation is that the local descriptors and holistic features emphasize different aspects of actions and are suitable for the different types of action databases. The proposed unified framework is based on frame differencing, bag-of-words and feature fusion. We extract two kinds of local descriptors, i.e. 2D and 3D SIFT feature descriptors, both based on 2D SIFT interest points. We apply Zernike moments to extract two kinds of holistic features, one is based on single frames and the other is based on motion energy image. We perform action recognition experiments on the KTH and Weizmann databases, using Support Vector Machines. We apply the leave-one-out and pseudo leave-N-out setups, and compare our proposed approach with state-of-the-art results. Experiments show that our proposed approach is effective. Compared with other approaches our approach is more robust, more versatile, easier to compute and simpler to understand.