Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Keansub Lee is active.

Publication


Featured researches published by Keansub Lee.


multimedia information retrieval | 2007

Large-scale multimodal semantic concept detection for consumer video

Shih-Fu Chang; Daniel P. W. Ellis; Wei Jiang; Keansub Lee; Akira Yanagawa; Alexander C. Loui; Jiebo Luo

In this paper we present a systematic study of automatic classification of consumer videos into a large set of diverse semantic concept classes, which have been carefully selected based on user studies and extensively annotated over 1300+ videos from real users. Our goals are to assess the state of the art of multimedia analytics (including both audio and visual analysis) in consumer video classification and to discover new research opportunities. We investigated several statistical approaches built upon global/local visual features, audio features, and audio-visual combinations. Three multi-modal fusion frameworks (ensemble, context fusion, and joint boosting) are also evaluated. Experiment results show that visual and audio models perform best for different sets of concepts. Both provide significant contributions to multimodal fusion, via expansion of the classifier pool for context fusion and the feature bases for feature sharing. The fused multimodal models are shown to significantly reduce the detection errors (compared to single modality models), resulting in a promising accuracy of 83% over diverse concepts. To the best of our knowledge, this is the first work on systematic investigation of multimodal classification using a large-scale ontology and realistic video corpus.


multimedia information retrieval | 2007

Kodak's consumer video benchmark data set: concept definition and annotation

Alexander C. Loui; Jiebo Luo; Shih-Fu Chang; Daniel P. W. Ellis; Wei Jiang; Lyndon Kennedy; Keansub Lee; Akira Yanagawa

Semantic indexing of images and videos in the consumer domain has become a very important issue for both research and actual application. In this work we developed Kodaks consumer video benchmark data set, which includes (1) a significant number of videos from actual users, (2) a rich lexicon that accommodates consumers. needs, and (3) the annotation of a subset of concepts over the entire video data set. To the best of our knowledge, this is the first systematic work in the consumer domain aimed at the definition of a large lexicon, construction of a large benchmark data set, and annotation of videos in a rigorous fashion. Such effort will have significant impact by providing a sound foundation for developing and evaluating large-scale learning-based semantic indexing/annotation techniques in the consumer domain.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

Audio-Based Semantic Concept Classification for Consumer Video

Keansub Lee; Daniel P. W. Ellis

This paper presents a novel method for automatically classifying consumer video clips based on their soundtracks. We use a set of 25 overlapping semantic classes, chosen for their usefulness to users, viability of automatic detection and of annotator labeling, and sufficiency of representation in available video collections. A set of 1873 videos from real users has been annotated with these concepts. Starting with a basic representation of each video clip as a sequence of mel-frequency cepstral coefficient (MFCC) frames, we experiment with three clip-level representations: single Gaussian modeling, Gaussian mixture modeling, and probabilistic latent semantic analysis of a Gaussian component histogram. Using such summary features, we produce support vector machine (SVM) classifiers based on the Kullback-Leibler, Bhattacharyya, or Mahalanobis distance measures. Quantitative evaluation shows that our approaches are effective for detecting interesting concepts in a large collection of real-world consumer video clips.


RIAO '07 Large Scale Semantic Access to Content (Text, Image, Video, and Sound) | 2007

Multimodal segmentation of lifelog data

Aiden R. Doherty; Alan F. Smeaton; Keansub Lee; Daniel P. W. Ellis

A personal lifelog of visual and audio information can be very helpful as a human memory augmentation tool. The SenseCam, a passive wearable camera, used in conjunction with an iRiver MP3 audio recorder, will capture over 20,000 images and 100 hours of audio per week. If used constantly, very soon this would build up to a substantial collection of personal data. To gain real value from this collection it is important to automatically segment the data into meaningful units or activities. This paper investigates the optimal combination of data sources to segment personal data into such activities. 5 data sources were logged and processed to segment a collection of personal data, namely: image processing on captured SenseCam images; audio processing on captured iRiver audio data; and processing of the temperature, white light level, and accelerometer sensors onboard the SenseCam device. The results indicate that a combination of the image, light and accelerometer sensor data segments our collection of personal data better than a combination of all 5 data sources. The accelerometer sensor is good for detecting when the user moves to a new location, while the image and light sensors are good for detecting changes in wearer activity within the same location, as well as detecting when the wearer socially interacts with others.


conference of the international speech communication association | 2004

Features for segmenting and classifying long-duration recordings of "personal" audio

Daniel P. W. Ellis; Keansub Lee

A digital recorder weighing ounces and able to record for more than ten hours can be bought for a few hundred dollars. Such devices make possible continuous recordings of “personal audio” – storing essentially everything heard by the owner. Without automatic indexing, however, such recordings are almost useless. In this paper, we describe some experiments with recordings of this kind, focusing on the problem of segmenting the recordings into different ‘episodes’ corresponding to different acoustic environments experienced by the device. We describe several novel features to describe 1-minute-long frames of audio, and investigate their effectiveness at reproducing hand-labeled ground-truth segment boundaries.


international conference on acoustics, speech, and signal processing | 2010

Detecting local semantic concepts in environmental sounds using Markov model based clustering

Keansub Lee; Daniel P. W. Ellis; Alexander C. Loui

Detecting the time of occurrence of an acoustic event (for instance, a cheer) embedded in a longer soundtrack is useful and important for applications such as search and retrieval in consumer video archives. We present a Markov-model based clustering algorithm able to identify and segment consistent sets of temporal frames into regions associated with different ground-truth labels, and simultaneously to exclude a set of uninformative frames shared in common from all clips. The labels are provided at the clip level, so this refinement of the time axis represents a variant of Multiple-Instance Learning (MIL). Evaluation shows that local concepts are effectively detected by this clustering technique based on coarse-scale labels, and that detection performance is significantly better than existing algorithms for classifying real-world consumer recordings.


international conference on acoustics, speech, and signal processing | 2008

Detecting music in ambient audio by long-window autocorrelation

Keansub Lee; Daniel P. W. Ellis

We address the problem of detecting music in the background of ambient real-world audio recordings such as the sound track of consumer-shot video. Such material may contain high levels of noises, and we seek to devise features that will reveal music content in such circumstances. Sustained, steady musical pitches show significant, structured autocorrelation at when calculated over windows of hundreds of milliseconds, where autocorrelation of aperiodic noise has become negligible at higher-lag points if a signal is whitened by LPC. Using such features, further compensated by their long-term average to remove the effect of stationary periodic noise, we produce GMM and SVM based classifiers with high performance compared with previous approaches, as verified on a corpus of real consumer video.


acm workshop on continuous archival and retrieval of personal experiences | 2004

Minimal-impact audio-based personal archives

Daniel P. W. Ellis; Keansub Lee


IEEE MultiMedia | 2006

Accessing Minimal-Impact Personal Audio Archives

Daniel P. W. Ellis; Keansub Lee


international conference on spoken language processing | 2006

Voice Activity Detection in Personal Audio Recordings Using Autocorrelogram Compensation

Keansub Lee; Daniel P. W. Ellis

Collaboration


Dive into the Keansub Lee's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jiebo Luo

University of Rochester

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Wei Jiang

Eastman Kodak Company

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge