Alejandro Jaimes | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alejandro Jaimes is active.

Explore More

Publication

Featured researches published by Alejandro Jaimes.

Computer Vision and Image Understanding | 2007

Multimodal human-computer interaction: A survey

Alejandro Jaimes; Nicu Sebe

In this paper, we review the major approaches to multimodal human-computer interaction, giving an overview of the field from a computer vision perspective. In particular, we focus on body, gesture, gaze, and affective interaction (facial expression recognition and emotion in audio). We discuss user and task modeling, and multimodal fusion, highlighting challenges, open issues, and emerging applications for multimodal human-computer interaction (MMHCI) research.

international conference on computer vision | 2005

Multimodal human computer interaction: a survey

Alejandro Jaimes; Nicu Sebe

In this paper we review the major approaches to multimodal human computer interaction from a computer vision perspective. In particular, we focus on body, gesture, gaze, and affective interaction (facial expression recognition, and emotion in audio). We discuss user and task modeling, and multimodal fusion, highlighting challenges, open issues, and emerging applications for Multimodal Human Computer Interaction (MMHCI) research.

electronic imaging | 1999

Conceptual framework for indexing visual information at multiple levels

Alejandro Jaimes; Shih-Fu Chang

In this paper, we present a conceptual framework for indexing different aspects of visual information. Our framework unifies concepts from this literature in diverse fields such as cognitive psychology, library sciences, art, and the more recent content-based retrieval. We present multiple level structures for visual and non-visual and non- visual information. The ten-level visual structure presented provides a systematic way of indexing images based on syntax and semantics, and includes distinctions between general concept and visual concept. We define different types of relations at different levels of the visual structure, and also use a semantic information table to summarize important aspects related to an image. While the focus is on the development of a conceptual indexing structure, our aim is also to bring together the knowledge from various fields, unifying the issues that should be considered when building a digital image library. Our analysis stresses the limitations of state of the art content-based retrieval systems and suggests areas in which improvements are necessary.

acm multimedia | 2006

Human-centered computing: a multimedia perspective

Alejandro Jaimes; Nicu Sebe; Daniel Gatica-Perez

Human-Centered Computing (HCC) is a set of methodologies that apply to any field that uses computers, in any form, in applications in which humans directly interact with devices or systems that use computer technologies. In this paper, we give an overview of HCC from a Multimedia perspective. We describe what we consider to be the three main areas of Human-Centered Multimedia (HCM): media production, analysis, and interaction. In addition, we identify the core characteristics of HCM, describe example applications, and propose a research agenda for HCM.

acm workshop on continuous archival and retrieval of personal experiences | 2004

Memory cues for meeting video retrieval

Alejandro Jaimes; Kengo Omura; Takeshi Nagamine; Kazutaka Hirata

We advocate a new approach to meeting video retrieval based on the use of memory cues. First we present a new survey involving 519 people in which we investigate the types of items people use to review meeting contents (e.g., minutes, video, etc.). Then we present a novel memory study involving 15 subjects in which we investigate what people remember about past meetings (e.g., seating position, etc). Based on these studies and related research we propose a novel framework for meeting video retrieval based on memory cues. Our proposed system graphically represents important memory retrieval cues such as room layout, participants faces and sitting positions, etc.. Queries are formulated dynamically: as the user graphically manipulates the cues, the query results are shown. Our system (1) helps users easily express the <i>cues</i> they recall about a particular meeting, and (2) helps users <i>remember</i> new cues for meeting video retrieval. Finally, we present our approach to automatic indexing of meeting videos, present experiments, and discuss research issues in automatic indexing for retrieval using memory cues.

international conference on multimedia and expo | 2003

Semi-automatic, data-driven construction of multimedia ontologies

Alejandro Jaimes; John R. Smith

In this paper we investigate semi-automatic construction of multimedia ontologies using a data-driven approach. We start with a collection of videos for which we wish to build an ontology (an explicit specification of a domain). Each video is pre-processed: scene cut detection, automatic speech recognition (ASR), and metadata extraction are performed. In addition we automatically index the videos based on visual content by extracting syntactic (e.g., color, texture, etc.) and semantic features (e.g., face, landscape, etc.). We then combine standard tools for ontology engineering and tools in content-based retrieval to semi-automatically build ontologies. In the first stage we process the text information available with the videos (ASR, metadata, and annotations, if any). Stop words (e.g., a, on, the) are eliminated and statistics (e.g., frequency, TFIDF, and entropy) are computed for all terms. Based on this data we manually select concepts and relationships to include in the ontology. Then we use content-based retrieval tools to assign multimedia entities (e.g., shots, videos, collections of videos) to concepts, properties, or relationships in the ontology, and to select multimedia entities as concepts, relationships, or properties in the ontology. We explore this methodology to construct multimedia ontologies from 24 hours of educational films from the 1940s-1960s used in the TREC video retrieval benchmark and discuss the problems encountered and future directions.

conference on image and video retrieval | 2003

Modal keywords, ontologies, and reasoning for video understanding

Alejandro Jaimes; Belle L. Tseng; John R. Smith

We proposed a novel framework for video content understanding that uses rules constructed from knowledge bases and multimedia ontologies. Our framework consists of an expert system that uses a rule-based engine, domain knowledge, visual detectors (for objects and scenes), and metadata (text from automatic speech recognition, related text, etc.). We introduce the idea of modal keywords, which are keywords that represent perceptual concepts in the following categories: visual (e.g., sky), aural (e.g., scream), olfactory (e.g., vanilla), tactile (e.g., feather), and taste (e.g., candy). A method is presented to automatically classify keywords from speech recognition, queries, or related text into these categories using WordNet and TGM I. For video understanding, the following operations are performed automatically: scene cut detection, automatic speech recognition, feature extraction, and visual detection (e.g., sky, face, indoor). These operation results are used in our system by a rule-based engine that uses context information (e.g., text from speech) to enhance visual detection results. We discuss semi-automatic construction of multimedia ontologies and present experiments in which visual detector outputs are modified by simple rules that use context information available with the video.

international conference on image processing | 2002

Learning personalized video highlights from detailed MPEG-7 metadata

Alejandro Jaimes; Tomio Echigo; Masayoshi Teraguchi; Fumiko Satoh

We present a new framework for generating personalized video digests from detailed event metadata. In the new approach high level semantic features (e.g., number of offensive events) are extracted from an existing metadata signal using time windows (e.g., features within 16 sec. intervals). Personalized video digests are generated using a supervised learning algorithm which takes as input examples of important/unimportant events. Window-based features are extracted from the metadata and used to train the system and build a classifier that, given metadata for a new video, classifies segments into important and unimportant, according to a specific user, to generate personalized video digests. Our experimental results using soccer video suggest that extracting high level semantic information from existing metadata can be used effectively (80% precision and 85% recall using cross validation) in generating personalized video digests.

multimedia information retrieval | 2005

Multimedia information retrieval: what is it, and why isn't anyone using it?

Alejandro Jaimes; Michael G. Christel; Sébastien Gilles; Ramesh R. Sarukkai; Wei-Ying Ma

In this paper, the participants of the panel at the 7th ACM SIGMM International Workshop on Multimedia Information Retrieval answer questions about what multimedia is, how MIR is different from other kinds of retrieval, the most important technical challenges in MIR, killer applications, opportunities, and future directions.

acm multimedia | 2002

Duplicate detection in consumer photography and news video

Alejandro Jaimes; Shih-Fu Chang; Alexander C. Loui

Consumers often make more than one photograph of the same scene, creating non-identical duplicates and near duplicates. In Kodaks consumer photography database, on average, 19% of the images, per roll, fall into this category. Automatic detection of duplicates, therefore, is extremely useful in applications that help users organize their image collections. We introduce the challenging problem of non-identical duplicate image detection in consumer photography, describe STELLA (a novel interactive personal image collection organization system), and give an overview of our novel framework for detecting duplicate and near duplicate consumer photographs and news videos.

Explore More