Huiying Shen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Huiying Shen is active.

Explore More

Publication

Featured researches published by Huiying Shen.

computer vision and pattern recognition | 2008

Detecting and locating crosswalks using a camera phone

Volodymyr Ivanchenko; James M. Coughlan; Huiying Shen

Urban intersections are the most dangerous parts of a blind or visually impaired personpsilas travel. To address this problem, this paper describes the novel ldquoCrosswatchldquo system, which uses computer vision to provide information about the location and orientation of crosswalks to a blind or visually impaired pedestrian holding a camera cell phone. A prototype of the system runs on an off-the-shelf Nokia N95 camera phone in real time, which automatically takes a few images per second, analyzes each image in a fraction of a second and sounds an audio tone when it detects a crosswalk. Real-time performance on the cell phone, whose computational resources are limited compared to the type of desktop platform usually used in computer vision, is made possible by coding in Symbian C++. Tests with blind subjects demonstrate the feasibility of the system.

computer vision and pattern recognition | 2004

Shape Matching with Belief Propagation: Using Dynamic Quantization to Accomodate Occlusion and Clutter

James M. Coughlan; Huiying Shen

Graphical models provide an attractive framework for shape matching because they are well-suited to formulating Bayesian models of deformable templates. In addition, the advent of powerful inference techniques such as belief propagation (BP) has recently made these models tractable. However, the enormous size of the state spaces involved in these applications (about the size of the pixel lattice) has restricted their use to models drawing on sparse feature maps (e.g. edges), which are typically unable to cope with missing or occluded features since the locations of missing features are not represented in the state space. We propose a novel method for allowing BP to handle partial occlusions in the presence of clutter, which we call dynamic quantization (DQ). DQ is an extension of standard pruning techniques which allows BP to adaptively add as well as subtract states as needed. Since DQ allows BP to focus on more probable regions of the image, the state space can be adaptively enlarged to include locations where features are occluded, without the computational burden of representing all possible pixel locations. The combination of BP and DQ yields deformable templates that are both fast and robust to significant occlusions, without requiring any user initialization. Experimental results are shown on deformable templates of planar shapes.

international conference on computers helping people with special needs | 2010

Real-time walk light detection with a mobile phone

Volodymyr Ivanchenko; James M. Coughlan; Huiying Shen

Crossing an urban traffic intersection is one of the most dangerous activities of a blind or visually impaired persons travel. Building on past work by the authors on the issue of proper alignment with the crosswalk, this paper addresses the complementary issue of knowing when it is time to cross. We describe a prototype portable system that alerts the user in real time once the Walk light is illuminated. The system runs as a software application on an off-the-shelf Nokia N95 mobile phone, using computer vision algorithms to analyze video acquired by the built-in camera to determine in real time if a Walk light is currently visible. Once a Walk light is detected, an audio tone is sounded to alert the user. Experiments with a blind volunteer subject at urban traffic intersections demonstrate proof of concept of the system, which successfully alerted the subject when the Walk light appeared.

Computer Vision and Image Understanding | 2007

Dynamic quantization for belief propagation in sparse spaces

James M. Coughlan; Huiying Shen

Graphical models provide an attractive framework for modeling a variety of problems in computer vision. The advent of powerful inference techniques such as belief propagation (BP) has recently made inference with many of these models tractable. Even so, the enormous size of the state spaces required for some applications can create a heavy computational burden. Pruning is a standard technique for reducing this burden, but since pruning is irreversible it carries the risk of greedily deleting important states, which can subsequently result in gross errors in BP. To address this problem, we propose a novel extension of pruning, which we call dynamic quantization (DQ), that allows BP to adaptively add as well as subtract states as needed. We examine DQ in the context of graphical-model based deformable template matching, in which the state space size is on the order of the number of pixels in an image. The combination of BP and DQ yields deformable templates that are both fast and robust to significant occlusions, without requiring any user initialization. Experimental results are shown on deformable templates of planar shapes. Finally, we argue that DQ is applicable to a variety of graphical models in which the state spaces are sparsely populated.

computer vision and pattern recognition | 2006

Reading LCD/LED Displays with a Camera Cell Phone

Huiying Shen; James M. Coughlan

Being able to read LCD/LED displays would be a very important step towards greater independence for persons who are blind or have low vision. A fast graphical model based algorithm is proposed for reading 7-segment digits in LCD/LED displays. The algorithm is implemented for Symbian camera cell phones in Symbian C++. The software reads one display in about 2 seconds by a push of a button on the cell phone (Nokia 6681, 220 MHz ARM CPU).

conference on computers and accessibility | 2008

Computer vision-based clear path guidance for blind wheelchair users

Volodymyr Ivanchenko; James M. Coughlan; William Gerrey; Huiying Shen

We describe a system for guiding blind and visually impaired wheelchair users along a clear path that uses computer vision to sense the presence of obstacles or other terrain features and warn the user accordingly. Since multiple terrain features can be distributed anywhere on the ground, and their locations relative to a moving wheelchair are continually changing, it is challenging to communicate this wealth of spatial information in a way that is rapidly comprehensible to the user. The main contribution of our system is the development of a novel user interface that allows the user to interrogate the environment by sweeping a standard (unmodified) white cane back and forth: the system continuously tracks the cane location and sounds an alert if a terrain feature is detected in the direction the cane is pointing. Experiments are described demonstrating the feasibility of the approach.

workshop on applications of computer vision | 2011

Localizing blurry and low-resolution text in natural images

Pannag R. Sanketi; Huiying Shen; James M. Coughlan

There is a growing body of work addressing the problem of localizing printed text regions occurring in natural scenes, all of it focused on images in which the text to be localized is resolved clearly enough to be read by OCR. This paper introduces an alternative approach to text localization based on the fact that it is often useful to localize text that is identifiable as text but too blurry or small to be read, for two reasons. First, an image can be decimated and processed at a coarser resolution than usual, resulting in faster localization before OCR is performed (at full resolution, if needed). Second, in real-time applications such as a cell phone app to find and read text, text may initially be acquired from a lower-resolution video image in which it appears too small to be read; once the texts presence and location have been established, a higher-resolution image can be taken in order to resolve the text clearly enough to read it. We demonstrate proof of concept of this approach by describing a novel algorithm for binarizing the image and extracting candidate text features, called “blobs,” and grouping and classifying the blobs into text and non-text categories. Experimental results are shown on a variety of images in which the text is resolved too poorly to be clearly read, but is still identifiable by our algorithm as text.

GbRPR'07 Proceedings of the 6th IAPR-TC-15 international conference on Graph-based representations in pattern recognition | 2007

Grouping using factor graphs: an approach for finding text with a camera phone

Huiying Shen; James M. Coughlan

We introduce a new framework for feature grouping based on factor graphs, which are graphical models that encode interactions among arbitrary numbers of random variables. The ability of factor graphs to express interactions higher than pairwise order (the highest order encountered in most graphical models used in computer vision) is useful for modeling a variety of pattern recognition problems. In particular, we show how this property makes factor graphs a natural framework for performing grouping and segmentation, which we apply to the problem of finding text in natural scenes. We demonstrate an implementation of our factor graph-based algorithm for finding text on a Nokia camera phone, which is intended for eventual use in a camera phone system that finds and reads text (such as street signs) in natural environments for blind users.

workshop on applications of computer vision | 2011

Real-time detection and reading of LED/LCD displays for visually impaired persons

Ender Tekin; James M. Coughlan; Huiying Shen

Modern household appliances, such as microwave ovens and DVD players, increasingly require users to read an LED or LCD display to operate them, posing a severe obstacle for persons with blindness or visual impairment. While OCR-enabled devices are emerging to address the related problem of reading text in printed documents, they are not designed to tackle the challenge of finding and reading characters in appliance displays. Any system for reading these characters must address the challenge of first locating the characters among substantial amounts of background clutter; moreover, poor contrast and the abundance of specular highlights on the display surface — which degrade the image in an unpredictable way as the camera is moved — motivate the need for a system that processes images at a few frames per second, rather than forcing the user to take several photos, each of which can take seconds to acquire and process, until one is readable. We describe a novel system that acquires video, detects and reads LED/LCD characters in real time, reading them aloud to the user with synthesized speech. The system has been implemented on both a desktop and a cell phone. Experimental results are reported on videos of display images, demonstrating the feasibility of the system.

Image and Vision Computing | 2009

Figure-ground segmentation using factor graphs

Huiying Shen; James M. Coughlan; Volodymyr Ivanchenko

Foreground-background segmentation has recently been applied [26,12] to the detection and segmentation of specific objects or structures of interest from the background as an efficient alternative to techniques such as deformable templates [27]. We introduce a graphical model (i.e. Markov random field)-based formulation of structure-specific figure-ground segmentation based on simple geometric features extracted from an image, such as local configurations of linear features, that are characteristic of the desired figure structure. Our formulation is novel in that it is based on factor graphs, which are graphical models that encode interactions among arbitrary numbers of random variables. The ability of factor graphs to express interactions higher than pairwise order (the highest order encountered in most graphical models used in computer vision) is useful for modeling a variety of pattern recognition problems. In particular, we show how this property makes factor graphs a natural framework for performing grouping and segmentation, and demonstrate that the factor graph framework emerges naturally from a simple maximum entropy model of figure-ground segmentation.We cast our approach in a learning framework, in which the contributions of multiple grouping cues are learned from training data, and apply our framework to the problem of finding printed text in natural scenes. Experimental results are described, including a performance analysis that demonstrates the feasibility of the approach.

Explore More