James M. Coughlan
Smith-Kettlewell Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by James M. Coughlan.
international conference on computer vision | 1999
James M. Coughlan; Alan L. Yuille
When designing computer vision systems for the blind and visually impaired it is important to determine the orientation of the user relative to the scene. We observe that most indoor and outdoor (city) scenes are designed on a Manhattan three-dimensional grid. This Manhattan grid structure puts strong constraints on the intensity gradients in the image. We demonstrate an algorithm for detecting the orientation of the user in such scenes based on Bayesian inference using statistics which we have learnt in this domain. Our algorithm requires a single input image and does not involve pre-processing stages such as edge detection and Hough grouping. We demonstrate strong experimental results on a range of indoor and outdoor images. We also show that estimating the grid structure makes it significantly easier to detect target objects which are not aligned with the grid.
Neural Computation | 2003
James M. Coughlan; Alan L. Yuille
This letter argues that many visual scenes are based on a Manhattan three-dimensional grid that imposes regularities on the image statistics. We construct a Bayesian model that implements this assumption and estimates the viewer orientation relative to the Manhattan grid. For many images, these estimates are good approximations to the viewer orientation (as estimated manually by the authors). These estimates also make it easy to detect outlier structures that are unaligned to the grid. To determine the applicability of the Manhattan world model, we implement a null hypothesis model that assumes that the image statistics are independent of any three-dimensional scene structure. We then use the log-likelihood ratio test to determine whether an image satisfies the Manhattan world assumption. Our results show that if an image is estimated to be Manhattan, then the Bayesian models estimates of viewer direction are almost always accurate (according to our manual estimates), and vice versa.
Computer Vision and Image Understanding | 2000
James M. Coughlan; Alan L. Yuille; Camper English; Daniel Snow
A novel deformable template is presented which detects the boundary of an open hand in a grayscale image without initialization by the user. A dynamic programming algorithm enhanced by pruning techniques finds the hand contour in the image in as little as 19 s on a Pentium 150 MHz. The template is translation- and rotation-invariant and accomodates shape deformation, significant occlusion and background clutter, and the presence of multiple hands.
computer vision and pattern recognition | 1999
Scott Konishi; Alan L. Yuille; James M. Coughlan; Song-Chun Zhu
We treat the problem of edge detection as one of statistical inference. Local edge cues, implemented by filters, provide information about the likely positions of edges which can be used as input to higher-level models. Different edge cues can be evaluated by the statistical effectiveness of their corresponding filters evaluated on a dataset of 100 presegmented images. We use information theoretic measures to determine the effectiveness of a variety of different edge detectors working at multiple scales on black and white and color images. Our results give quantitative measures for the advantages of multi-level processing, for the use of chromaticity in addition to greyscale, and for the relative effectiveness of different detectors.
Communications of The ACM | 2012
Roberto Manduchi; James M. Coughlan
Computer vision holds the key for the blind or visually impaired to explore the visual world.
computer vision and pattern recognition | 2008
Volodymyr Ivanchenko; James M. Coughlan; Huiying Shen
Urban intersections are the most dangerous parts of a blind or visually impaired personpsilas travel. To address this problem, this paper describes the novel ldquoCrosswatchldquo system, which uses computer vision to provide information about the location and orientation of crosswalks to a blind or visually impaired pedestrian holding a camera cell phone. A prototype of the system runs on an off-the-shelf Nokia N95 camera phone in real time, which automatically takes a few images per second, analyzes each image in a fraction of a second and sounds an audio tone when it detects a crosswalk. Real-time performance on the cell phone, whose computational resources are limited compared to the type of desktop platform usually used in computer vision, is made possible by coding in Symbian C++. Tests with blind subjects demonstrate the feasibility of the system.
Image and Vision Computing | 2003
Scott Konishi; Alan L. Yuille; James M. Coughlan
Abstract We propose a statistical approach to combining edge cues at multiple scales using data driven probability distributions. These distributions are learnt on the Sowerby and South Florida datasets which include the ground truth positions of edges. We evaluate our results using Chernoff information and conditional entropy. Our results demonstrate the effectiveness of multi-scale processing and validate previous heuristics such as coarse-to-fine edge tracking.
International Journal on Artificial Intelligence Tools | 2009
James M. Coughlan; Roberto Manduchi
We describe a wayfinding system for blind and visually impaired persons that uses a camera phone to determine the users location with respect to color markers, posted at locations of interest (such as offices), which are automatically detected by the phone. The color marker signs are specially designed to be detected in real time in cluttered environments using computer vision software running on the phone; a novel segmentation algorithm quickly locates the borders of the color marker in each image, which allows the system to calculate how far the marker is from the phone. We present a model of how the users scanning strategy (i.e. how he/she pans the phone left and right to find color markers) affects the systems ability to detect color markers given the limitations imposed by motion blur, which is always a possibility whenever a camera is in motion. Finally, we describe experiments with our system tested by blind and visually impaired volunteers, demonstrating their ability to reliably use the system to find locations designated by color markers in a variety of indoor and outdoor environments, and elucidating which search strategies were most effective for users.
international conference on computers helping people with special needs | 2008
Roberto Manduchi; James M. Coughlan; Volodymyr Ivanchenko
We report new experiments conducted using a camera phone wayfinding system, which is designed to guide a visually impaired user to machine-readable signs (such as barcodes) labeled with special color markers. These experiments specifically investigate search strategies of such users detecting, localizing and touching color markers that have been mounted in various ways in different environments: in a corridor (either flush with the wall or mounted perpendicular to it) or in a large room with obstacles between the user and the markers. The results show that visually impaired users are able to reliably find color markers in all the conditions that we tested, using search strategies that vary depending on the environment in which they are placed.
computer vision and pattern recognition | 2004
James M. Coughlan; Huiying Shen
Graphical models provide an attractive framework for shape matching because they are well-suited to formulating Bayesian models of deformable templates. In addition, the advent of powerful inference techniques such as belief propagation (BP) has recently made these models tractable. However, the enormous size of the state spaces involved in these applications (about the size of the pixel lattice) has restricted their use to models drawing on sparse feature maps (e.g. edges), which are typically unable to cope with missing or occluded features since the locations of missing features are not represented in the state space. We propose a novel method for allowing BP to handle partial occlusions in the presence of clutter, which we call dynamic quantization (DQ). DQ is an extension of standard pruning techniques which allows BP to adaptively add as well as subtract states as needed. Since DQ allows BP to focus on more probable regions of the image, the state space can be adaptively enlarged to include locations where features are occluded, without the computational burden of representing all possible pixel locations. The combination of BP and DQ yields deformable templates that are both fast and robust to significant occlusions, without requiring any user initialization. Experimental results are shown on deformable templates of planar shapes.