David J. Huber | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David J. Huber is active.

Explore More

Publication

Featured researches published by David J. Huber.

Intelligent Computing: Theory and Applications V | 2007

Bio-inspired visual attention and object recognition

Deepak Khosla; Christopher K. Moore; David J. Huber; Suhas E. Chelian

This paper describes a bio-inspired Visual Attention and Object Recognition System (VARS) that can (1) learn representations of objects that are invariant to scale, position and orientation; and (2) recognize and locate these objects in static and video imagery. The system uses modularized bio-inspired algorithms/techniques that can be applied towards finding salient objects in a scene, recognizing those objects, and prompting the user for additional information to facilitate interactive learning. These algorithms are based on models of human visual attention, search, recognition and learning. The implementation is highly modular, and the modules can be used as a complete system or independently. The underlying technologies were carefully researched in order to ensure they were robust, fast, and could be integrated into an interactive system. We evaluated our systems capabilities on the Caltech-101 and COIL-100 datasets, which are commonly used in machine vision, as well as on simulated scenes. Preliminary results are quite promising in that our system is able to process these datasets with good accuracy and low computational times.

Proceedings of SPIE | 2010

A bio-inspired method and system for visual object-based attention and segmentation

David J. Huber; Deepak Khosla

This paper describes a method and system of human-like attention and object segmentation in visual scenes that (1) attends to regions in a scene in their rank of saliency in the image, (2) extracts the boundary of an attended proto-object based on feature contours, and (3) can be biased to boost the attention paid to specific features in a scene, such as those of a desired target object in static and video imagery. The purpose of the system is to identify regions of a scene of potential importance and extract the region data for processing by an object recognition and classification algorithm. The attention process can be performed in a default, bottom-up manner or a directed, top-down manner which will assign a preference to certain features over others. One can apply this system to any static scene, whether that is a still photograph or imagery captured from video. We employ algorithms that are motivated by findings in neuroscience, psychology, and cognitive science to construct a system that is novel in its modular and stepwise approach to the problems of attention and region extraction, its application of a flooding algorithm to break apart an image into smaller proto-objects based on feature density, and its ability to join smaller regions of similar features into larger proto-objects. This approach allows many complicated operations to be carried out by the system in a very short time, approaching real-time. A researcher can use this system as a robust front-end to a larger system that includes object recognition and scene understanding modules; it is engineered to function over a broad range of situations and can be applied to any scene with minimal tuning from the user.

Proceedings of SPIE | 2009

Fusion of multi-sensory saliency maps for automated perception and control

David J. Huber; Deepak Khosla; Paul Alex Dow

In many real-world situations and applications that involve humans or machines (e.g., situation awareness, scene understanding, driver distraction, workload reduction, assembly, robotics, etc.) multiple sensory modalities (e.g., vision, auditory, touch, etc.) are used. The incoming sensory information can overwhelm processing capabilities of both humans and machines. An approach for estimating what is most important in our sensory environment (bottom-up or goal-driven) and using that as a basis for workload reduction or taking an action could be of great benefit in applications involving humans, machines or human-machine interactions. In this paper, we describe a novel approach for determining high saliency stimuli in multi-modal sensory environments, e.g., vision, sound, touch, etc. In such environments, the high saliency stimuli could be a visual object, a sound source, a touch event, etc. The high saliency stimuli are important and should be attended to from perception, cognition or/and action perspective. The system accomplishes this by the fusion of saliency maps from multiple sensory modalities (e.g., visual and auditory) into a single, fused multimodal saliency map that is represented in a common, higher-level coordinate system. This paper describes the computational model and method for generating multi-modal or fused saliency map. The fused saliency map can be used to determine primary and secondary foci of attention as well as for active control of a hardware/device. Such a computational model of fused saliency map would be immensely useful for a machine-based or robot-based application in a multi-sensory environment. We describe the approach, system and present preliminary results on a real-robotic platform.

international symposium on visual computing | 2011

A neuromorphic approach to object detection and recognition in airborne videos with stabilization

Yang Chen; Deepak Khosla; David J. Huber; Kyungnam Kim; Shinko Y. Cheng

Research has shown that the application of an attention algorithm to the front-end of an object recognition system can provide a boost in performance over extracting regions from an image in an unguided manner. However, when video imagery is taken from a moving platform, attention algorithms such as saliency can lose their potency. In this paper, we show that this loss is due to the motion channels in the saliency algorithm not being able to distinguish object motion from motion caused by platform movement in the videos, and that an object recognition system for such videos can be improved through the application of image stabilization and saliency. We apply this algorithm to airborne video samples from the DARPA VIVID dataset and demonstrate that the combination of stabilization and saliency significantly improves object recognition system performance for both stationary and moving objects.

Proceedings of SPIE | 2013

Real-time low-power neuromorphic hardware for autonomous object recognition

Deepak Khosla; Yang Chen; David J. Huber; Darrel J. Van Buer; Kyungnam Kim; Shinko Y. Cheng

Unmanned surveillance platforms have a ubiquitous presence in surveillance and reconnaissance operations. As the resolution and fidelity of the video sensors on these platforms increases, so does the bandwidth required to provide the data to the analyst and the subsequent analyst workload to interpret it. This leads to an increasing need to perform video processing on-board the sensor platform, thus transmitting only critical information to the analysts, reducing both the data bandwidth requirements and analyst workload. In this paper, we present a system for object recognition in video that employs embedded hardware and CPUs that can be implemented onboard an autonomous platform to provide real-time information extraction. Called NEOVUS (NEurOmorphic Understanding of Scenes), our system draws inspiration from models of mammalian visual processing and is implemented in state-of-the-art COTS hardware to achieve low size, weight and power, while maintaining realtime processing at reasonable cost. We use visual attention methods for detection of stationary and moving objects from a moving platform based in motion and form, and employ multi-scale convolutional neural networks for classification, which has been mapped to FPGA hardware. Evaluation of our system has shown that we can achieve real-time speeds of thirty frames per second with up to five-megapixel resolution videos. Our system shows three to four orders of magnitude in power reduction compared to state of the art computer vision algorithms while reducing the communications bandwidth required for evaluation.

Proceedings of SPIE | 2009

3D hierarchical spatial representation and memory of multimodal sensory data

Deepak Khosla; Paul Alex Dow; David J. Huber

This paper describes an efficient method and system for representing, processing and understanding multi-modal sensory data. More specifically, it describes a computational method and system for how to process and remember multiple locations in multimodal sensory space (e.g., visual, auditory, somatosensory, etc.). The multimodal representation and memory is based on a biologically-inspired hierarchy of spatial representations implemented with novel analogues of real representations used in the human brain. The novelty of the work is in the computationally efficient and robust spatial representation of 3D locations in multimodal sensory space as well as an associated working memory for storage and recall of these representations at the desired level for goal-oriented action. We describe (1) A simple and efficient method for human-like hierarchical spatial representations of sensory data and how to associate, integrate and convert between these representations (head-centered coordinate system, body-centered coordinate, etc.); (2) a robust method for training and learning a mapping of points in multimodal sensory space (e.g., camera-visible object positions, location of auditory sources, etc.) to the above hierarchical spatial representations; and (3) a specification and implementation of a hierarchical spatial working memory based on the above for storage and recall at the desired level for goal-oriented action(s). This work is most useful for any machine or human-machine application that requires processing of multimodal sensory inputs, making sense of it from a spatial perspective (e.g., where is the sensory information coming from with respect to the machine and its parts) and then taking some goal-oriented action based on this spatial understanding. A multi-level spatial representation hierarchy means that heterogeneous sensory inputs (e.g., visual, auditory, somatosensory, etc.) can map onto the hierarchy at different levels. When controlling various machine/robot degrees of freedom, the desired movements and action can be computed from these different levels in the hierarchy. The most basic embodiment of this machine could be a pan-tilt camera system, an array of microphones, a machine with arm/hand like structure or/and a robot with some or all of the above capabilities. We describe the approach, system and present preliminary results on a real-robotic platform.

Journal of the Acoustical Society of America | 2013

Method and system for computing fused saliency maps from multi-modal sensory inputs

David J. Huber; Deepak Khosla; Paul Alex Dow

The present disclosure describes a fused saliency map from visual and auditory saliency maps. The saliency maps are in azimuth and elevation coordinates. The auditory saliency map is based on intensity, frequency and temporal conspicuity maps. Once the auditory saliency map is determined, the map is converted into azimuth and elevation coordinates by processing selected snippets of sound from each of four microphones arranged on a robot head to detect the location of the sound source generating the saliencies.

Proceedings of SPIE | 2011

Method for recognition and pose estimation of multiple occurrences of multiple objects in visual images

David J. Huber; Deepak Khosla

This paper describes a system for multiple-object recognition and segmentation that (1) correctly identifies objects in a natural scene and provides a boundary for each object, (2) can identify multiple occurrences of the same object (e.g., two identical objects, side-by-side) in the scene from different training views. The algorithm is novel in that it employs statistical modeling to efficiently prune features from an identified object from the scene without disturbing similar features elsewhere in the scene. The originality of the approach allows one to analyze complex scenes that occur in nature contain multiple instances of the same object.

Proceedings of SPIE | 2011

Optimal detection of objects in images and videos using electroencephalography (EEG)

Deepak Khosla; Rajan Bhattacharyya; Penn Tasinga; David J. Huber

The Rapid Serial Visual Presentation (RSVP) protocol for EEG has recently been discovered as a useful tool for highthroughput filtering of images into simple target and nontarget categories [1]. This concept can be extended to the detection of objects and anomalies in images and videos that are of interest to the user (observer) in an applicationspecific context. For example, an image analyst looking for a moving vehicle in wide-area imagery will consider such an object to be target or Item Of Interest (IOI). The ordering of images in the RSVP sequence is expected to have an impact on the detection accuracy. In this paper, we describe an algorithm for learning the RSVP ordering that employs a user interaction step to maximize the detection accuracy while simultaneously minimizing false alarms. With user feedback, the algorithm learns the optimal balance of image distance metrics in order to closely emulate the humans own preference for image order. It then employs the fusion of various perceptual and bio-inspired image metrics to emulate the humans sequencing ability for groups of image chips, which are subsequently used in RSVP trials. Such a method can be employed in human-assisted threat assessment in which the system must scan a wide field of view and report any detections or anomalies to the landscape. In these instances, automated classification methods might fail. We will describe the algorithm and present preliminary results on real-world imagery.

Proceedings of SPIE | 2011

Bio-inspired 'surprise' for real-time change detection in visual imagery

David J. Huber; Deepak Khosla

This paper describes a fast and robust bio-inspired method for change detection in high-resolution visual imagery. It is based on the computation of surprise, a dynamic analogue to visual saliency or attention, that uses very little processing beyond that of the initial computation of saliency. This is different from prior surprise algorithms, which employ complex statistical models to describe the scene and detect anomalies. This algorithm can detect changes in a busy scene (e.g., a person crawling in bushes or a vehicle moving in a desert) in real-time on typical video frame rates and can be used as a front-end to a larger system that includes object recognition and scene understanding modules that operate on the detected surprising regions.

Explore More