Ardhendu Behera
University of Leeds
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ardhendu Behera.
Pattern Recognition Letters | 2013
James M. Ferryman; David C. Hogg; Jan Sochman; Ardhendu Behera; Jose A. Rodriguez-Serrano; Simon F. Worgan; Longzhen Li; Valerie Leung; Murray Evans; Philippe Cornic; Stéphane Herbin; Stefan Schlenger; Michael Dose
This paper presents a video surveillance framework that robustly and efficiently detects abandoned objects in surveillance scenes. The framework is based on a novel threat assessment algorithm which combines the concept of ownership with automatic understanding of social relations in order to infer abandonment of objects. Implementation is achieved through development of a logic-based inference engine based on Prolog. Threat detection performance is conducted by testing against a range of datasets describing realistic situations and demonstrates a reduction in the number of false alarms generated. The proposed system represents the approach employed in the EU SUBITO project (Surveillance of Unattended Baggage and the Identification and Tracking of the Owner).
international conference on machine learning | 2004
Denis Lalanne; Rolf Ingold; Didier von Rotz; Ardhendu Behera; Dalila Mekhaldi; Andrei Popescu-Belis
Static documents play a central role in multimodal applications such as meeting recording and browsing. They provide a variety of structures, in particular thematic, for segmenting meetings, structures that are often hard to extract from audio and video. In this article, we present four steps for creating a strong link between static documents and multimedia meeting archives. First, a document-centric meeting environment is introduced. Then, a document analysis tool is presented, which builds a multi-layered representation of documents and creates indexes that are further on used by document/speech and document/video alignment methods. Finally, a document-based browsing system, integrating the various alignment results, is described along with a preliminary user evaluation.
international conference on multimedia and expo | 2004
Ardhendu Behera; Denis Lalanne; Rolf Ingold
In the context of a multimodal application, the article proposes an image-based method for bridging the gap between document excerpts and video extracts. The approach, called document image alignment, takes advantage of the observable events related to documents that are visible during meetings. In particular, the article presents a new method for detecting slide changes in slideshows, its evaluation, and a preliminary work on document identification.
Experimental Brain Research | 2011
Christina J. Howard; Iain D. Gilchrist; Tom Troscianko; Ardhendu Behera; David C. Hogg
Low-level stimulus salience and task relevance together determine the human fixation priority assigned to scene locations (Fecteau and Munoz in Trends Cogn Sci 10(8):382–390, 2006). However, surprisingly little is known about the contribution of task relevance to eye movements during real-world visual search where stimuli are in constant motion and where the ‘target’ for the visual search is abstract and semantic in nature. Here, we investigate this issue when participants continuously search an array of four closed-circuit television (CCTV) screens for suspicious events. We recorded eye movements whilst participants watched real CCTV footage and moved a joystick to continuously indicate perceived suspiciousness. We find that when multiple areas of a display compete for attention, gaze is allocated according to relative levels of reported suspiciousness. Furthermore, this measure of task relevance accounted for twice the amount of variance in gaze likelihood as the amount of low-level visual changes over time in the video stimuli.
Multimedia Tools and Applications | 2008
Ardhendu Behera; Denis Lalanne; Rolf Ingold
This paper describes the DocMIR system which captures, analyzes and indexes automatically meetings, conferences, lectures, etc. by taking advantage of the documents projected (e.g. slideshows, budget tables, figures, etc.) during the events. For instance, the system can automatically apply the above-mentioned procedures to a lecture and automatically index the event according to the presented slides and their contents. For indexing, the system requires neither specific software installed on the presenter’s computer nor any conscious intervention of the speaker throughout the presentation. The only material required by the system is the electronic presentation file of the speaker. Even if not provided, the system would temporally segment the presentation and offer a simple storyboard-like browsing interface. The system runs on several capture boxes connected to cameras and microphones that records events, synchronously. Once the recording is over, indexing is automatically performed by analyzing the content of the captured video containing projected documents and detects the scene changes, identifies the documents, computes their duration and extracts their textual content. Each of the captured images is identified from a repository containing all original electronic documents, captured audio–visual data and metadata created during post-production. The identification is based on documents’ signatures, which hierarchically structure features from both layout structure and color distributions of the document images. Video segments are finally enriched with textual content of the identified original documents, which further facilitate the query and retrieval without using OCR. The signature-based indexing method proposed in this article is robust and works with low-resolution images and can be applied to several other applications including real-time document recognition, multimedia IR and augmented reality systems.
document engineering | 2004
Ardhendu Behera; Denis Lalanne; Rolf Ingold
In this paper, we present (a) a method for identifying documents captured from low-resolution devices such as web-cams, digital cameras or mobile phones and (b) a technique for extracting their textual content without performing OCR. The first method associates a hierarchically structured visual signature to the low-resolution document image and further matches it with the visual signatures of the original high-resolution document images, stored in PDF form in a repository. The matching algorithm follows the signature hierarchy, which speeds-up the search by guiding it towards fruitful solution spaces. In a second step, the content of the original PDF document is extracted, structured, and matched with its corresponding high-resolution visual signature. Finally, the matched content is attached to the low-resolution document images visual signature, which greatly enriches the documents content and indexing. We present in this article both these identification and extraction methods and evaluate them on various documents, resolutions and lighting conditions, using different capture devices.
PLOS ONE | 2015
Gabriele Bleser; Dima Damen; Ardhendu Behera; Gustaf Hendeby; Katharina Mura; Markus Miezal; Andrew P. Gee; Nils Petersen; Gustavo Maçães; Hugo Domingues; Dominic Gorecky; Luis Almeida; Walterio W. Mayol-Cuevas; Andrew D Calway; Anthony G. Cohn; David C. Hogg; Didier Stricker
Today, the workflows that are involved in industrial assembly and production activities are becoming increasingly complex. To efficiently and safely perform these workflows is demanding on the workers, in particular when it comes to infrequent or repetitive tasks. This burden on the workers can be eased by introducing smart assistance systems. This article presents a scalable concept and an integrated system demonstrator designed for this purpose. The basic idea is to learn workflows from observing multiple expert operators and then transfer the learnt workflow models to novice users. Being entirely learning-based, the proposed system can be applied to various tasks and domains. The above idea has been realized in a prototype, which combines components pushing the state of the art of hardware and software designed with interoperability in mind. The emphasis of this article is on the algorithms developed for the prototype: 1) fusion of inertial and visual sensor information from an on-body sensor network (BSN) to robustly track the user’s pose in magnetically polluted environments; 2) learning-based computer vision algorithms to map the workspace, localize the sensor with respect to the workspace and capture objects, even as they are carried; 3) domain-independent and robust workflow recovery and monitoring algorithms based on spatiotemporal pairwise relations deduced from object and user movement with respect to the scene; and 4) context-sensitive augmented reality (AR) user feedback using a head-mounted display (HMD). A distinguishing key feature of the developed algorithms is that they all operate solely on data from the on-body sensor network and that no external instrumentation is needed. The feasibility of the chosen approach for the complete action-perception-feedback loop is demonstrated on three increasingly complex datasets representing manual industrial tasks. These limited size datasets indicate and highlight the potential of the chosen technology as a combined entity as well as point out limitations of the system.
Frontiers in Human Neuroscience | 2013
Christina J. Howard; Tom Troscianko; Iain D. Gilchrist; Ardhendu Behera; David C. Hogg
Perception of scenes has typically been investigated by using static or simplified visual displays. How attention is used to perceive and evaluate dynamic, realistic scenes is more poorly understood, in part due to the problem of comparing eye fixations to moving stimuli across observers. When the task and stimulus is common across observers, consistent fixation location can indicate that that region has high goal-based relevance. Here we investigated these issues when an observer has a specific, and naturalistic, task: closed-circuit television (CCTV) monitoring. We concurrently recorded eye movements and ratings of perceived suspiciousness as different observers watched the same set of clips from real CCTV footage. Trained CCTV operators showed greater consistency in fixation location and greater consistency in suspiciousness judgements than untrained observers. Training appears to increase between-operators consistency by learning “knowing what to look for” in these scenes. We used a novel “Dynamic Area of Focus (DAF)” analysis to show that in CCTV monitoring there is a temporal relationship between eye movements and subsequent manual responses, as we have previously found for a sports video watching task. For trained CCTV operators and for untrained observers, manual responses were most highly related to between-observer eye position spread when a temporal lag was introduced between the fixation and response data. Several hundred milliseconds after between-observer eye positions became most similar, observers tended to push the joystick to indicate perceived suspiciousness. Conversely, several hundred milliseconds after between-observer eye positions became dissimilar, observers tended to rate suspiciousness as low. These data provide further support for this DAF method as an important tool for examining goal-directed fixation behavior when the stimulus is a real moving image.
international conference on document analysis and recognition | 2005
Ardhendu Behera; Denis Lalanne; Rolf Ingold
This paper proposes a multi-signature document identification method that works robustly with low-resolution documents captured from handheld devices. The proposed method is based on the extraction of a visual signature containing both (a) the color content distribution in the image plane of the document, i.e. the color signature, and (b) the shallow layout structure of the document, i.e. the layout signature. The color distribution is first considered, in order to filter documents with very dissimilar colors, and the identification is finally done on the remaining set using the layout signature. An evaluation, that compares our color and layout-based method with the layout signature alone, is finally presented.
british machine vision conference | 2014
Ardhendu Behera; Anthony G. Cohn; David C. Hogg
In this paper, we present a novel method to explore semantically meaningful visual information and identify the discriminative spatiotemporal relationships between them for real-time activity recognition. Our approach infers human activities using continuous egocentric (first-person-view) videos of object manipulations in an industrial setup. In order to achieve this goal, we propose a random forest that unifies randomization, discriminative relationships mining and a Markov temporal structure. Discriminative relationships mining helps us to model relations that distinguish different activities, while randomization allows us to handle the large feature space and prevents over-fitting. The Markov temporal structure provides temporally consistent decisions during testing. The proposed random forest uses a discriminative Markov decision tree, where every nonterminal node is a discriminative classifier and the Markov structure is applied at leaf nodes. The proposed approach outperforms the state-of-the-art methods on a new challenging video dataset of assembling a pump system.