Is this you? Create Your Porfile

Tom Yeh

University of Colorado Boulder

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tom Yeh is active.

Explore More

Publication

Featured researches published by Tom Yeh.

user interface software and technology | 2010

VizWiz: nearly real-time answers to visual questions

Jeffrey P. Bigham; Chandrika Jayant; Hanjie Ji; Greg Little; Andrew Miller; Robert C. Miller; Robin Miller; Aubrey Tatarowicz; Brandyn White; Samual White; Tom Yeh

The lack of access to visual information like text labels, icons, and colors can cause frustration and decrease independence for blind people. Current access technology uses automatic approaches to address some problems in this space, but the technology is error-prone, limited in scope, and quite expensive. In this paper, we introduce VizWiz, a talking application for mobile phones that offers a new alternative to answering visual questions in nearly real-time - asking multiple people on the web. To support answering questions quickly, we introduce a general approach for intelligently recruiting human workers in advance called quikTurkit so that workers are available when new questions arrive. A field deployment with 11 blind participants illustrates that blind people can effectively use VizWiz to cheaply answer questions in their everyday lives, highlighting issues that automatic approaches will need to address to be useful. Finally, we illustrate the potential of using VizWiz as part of the participatory design of advanced tools by using it to build and evaluate VizWiz::LocateIt, an interactive mobile tool that helps blind people solve general visual search problems.

user interface software and technology | 2009

Sikuli: using GUI screenshots for search and automation

Tom Yeh; Tsung-Hsiang Chang; Robert C. Miller

We present Sikuli, a visual approach to search and automation of graphical user interfaces using screenshots. Sikuli allows users to take a screenshot of a GUI element (such as a toolbar button, icon, or dialog box) and query a help system using the screenshot instead of the elements name. Sikuli also provides a visual scripting API for automating GUI interactions, using screenshot patterns to direct mouse and keyboard events. We report a web-based user study showing that searching by screenshot is easy to learn and faster to specify than keywords. We also demonstrate several automation tasks suitable for visual scripting, such as map navigation and bus tracking, and show how visual scripting can improve interactive help systems previously proposed in the literature.

computer vision and pattern recognition | 2004

Searching the Web with mobile images for location recognition

Tom Yeh; Konrad Tollmar; Trevor Darrell

We describe an approach to recognizing location from mobile devices using image-based Web search. We demonstrate the usefulness of common image search metrics applied on images captured with a camera-equipped mobile device to find matching images on the World Wide Web or other general-purpose databases. Searching the entire Web can be computationally overwhelming, so we devise a hybrid image-and-keyword searching technique. First, image-search is performed over images and links to their source Web pages in a database that indexes only a small fraction of the Web. Then, relevant keywords on these Web pages are automatically identified and submitted to an existing text-based search engine (e.g. Google) that indexes a much larger portion of the Web. Finally, the resulting image set is filtered to retain images close to the original query. It is thus possible to efficiently search hundreds of millions of images that are not only textually related but also visually relevant. We demonstrate our approach on an application allowing users to browse Web pages matching the image of a nearby location.

human factors in computing systems | 2010

GUI testing using computer vision

Tsung-Hsiang Chang; Tom Yeh; Robert C. Miller

Testing a GUIs visual behavior typically requires human testers to interact with the GUI and to observe whether the expected results of interaction are presented. This paper presents a new approach to GUI testing using computer vision for testers to automate their tasks. Testers can write a visual test script that uses images to specify which GUI components to interact with and what visual feedback to be observed. Testers can also generate visual test scripts by demonstration. By recording both input events and screen images, it is possible to extract the images of components interacted with and the visual feedback seen by the demonstrator, and generate a visual test script automatically. We show that a variety of GUI behavior can be tested using this approach. Also, we show how this approach can facilitate good testing practices such as unit testing, regression testing, and test-driven development.

human factors in computing systems | 2013

Predicting users' first impressions of website aesthetics with a quantification of perceived visual complexity and colorfulness

Katharina Reinecke; Tom Yeh; Luke Miratrix; Rahmatri Mardiko; Yuechen Zhao; Jenny Jiaqi Liu; Krzysztof Z. Gajos

Users make lasting judgments about a websites appeal within a split second of seeing it for the first time. This first impression is influential enough to later affect their opinions of a sites usability and trustworthiness. In this paper, we demonstrate a means to predict the initial impression of aesthetics based on perceptual models of a websites colorfulness and visual complexity. In an online study, we collected ratings of colorfulness, visual complexity, and visual appeal of a set of 450 websites from 548 volunteers. Based on these data, we developed computational models that accurately measure the perceived visual complexity and colorfulness of website screenshots. In combination with demographic variables such as a users education level and age, these models explain approximately half of the variance in the ratings of aesthetic appeal given after viewing a website for 500ms only.

Proceedings of the Tenth International Workshop on Multimedia Data Mining | 2010

Web-scale computer vision using MapReduce for multimedia data mining

Brandyn White; Tom Yeh; Jimmy J. Lin; Larry S. Davis

This work explores computer vision applications of the MapReduce framework that are relevant to the data mining community. An overview of MapReduce and common design patterns are provided for those with limited MapReduce background. We discuss both the high level theory and the low level implementation for several computer vision algorithms: classifier training, sliding windows, clustering, bag-of-features, background subtraction, and image registration. Experimental results for the k-means clustering and single Gaussian background subtraction algorithms are performed on a 410 node Hadoop cluster.

computer vision and pattern recognition | 2010

VizWiz::LocateIt - enabling blind people to locate objects in their environment

Jeffrey P. Bigham; Chandrika Jayant; Andrew Miller; Brandyn White; Tom Yeh

Blind people face a number of challenges when interacting with their environments because so much information is encoded visually. Text is pervasively used to label objects, colors carry special significance, and items can easily become lost in surroundings that cannot be quickly scanned. Many tools seek to help blind people solve these problems by enabling them to query for additional information, such as color or text shown on the object. In this paper we argue that many useful problems may be better solved by direclty modeling them as search problems, and present a solution called VizWiz::LocateIt that directly supports this type of interaction. VizWiz::LocateIt enables blind people to take a picture and ask for assistance in finding a specific object. The request is first forwarded to remote workers who outline the object, enabling efficient and accurate automatic computer vision to guide users interactively from their existing cellphones. A two-stage algorithm is presented that uses this information to guide users to the appropriate object interactively from their phone.

computer vision and pattern recognition | 2009

Fast concurrent object localization and recognition

Tom Yeh; John J. Lee; Trevor Darrell

Object localization and recognition are important problems in computer vision. However, in many applications, exhaustive search over all object models and image locations is computationally prohibitive. While several methods have been proposed to make either recognition or localization more efficient, few have dealt with both tasks simultaneously. This paper proposes an efficient method for concurrent object localization and recognition based on a data-dependent multi-class branch-and-bound formalism. Existing bag-of-features recognition techniques which can be expressed as weighted combinations of feature counts can be readily adapted to our method. We present experimental results that demonstrate the merit of our algorithm in terms of recognition accuracy, localization accuracy, and speed, compared to baseline approaches including exhaustive search, implicit-shape model (ISM), and efficient sub-window search (ESS). Moreover, we develop two extensions to consider non-rectangular bounding regions-composite boxes and polygons-and demonstrate their ability to achieve higher recognition scores compared to traditional rectangular bounding boxes.

human factors in computing systems | 2005

A picture is worth a thousand keywords: image-based object search on a mobile platform

Tom Yeh; Kristen Grauman; Konrad Tollmar; Trevor Darrell

Finding information based on an objects visual appearance is useful when specific keywords for the object are not known. We have developed a mobile image-based search system that takes images of objects as queries and finds relevant web pages by matching them to similar images on the web. Image-based search works well when matching full scenes, such as images of buildings or landmarks, and for matching objects when the boundary of the object in the image is available. We demonstrate the effectiveness of a simple interactive paradigm for obtaining a segmented object boundary, and show how a shape-based image matching algorithm can use the object outline to find similar images on the web.

acm multimedia | 2008

Photo-based question answering

Tom Yeh; John J. Lee; Trevor Darrell

Photo-based question answering is a useful way of finding information about physical objects. Current question answering (QA) systems are text-based and can be difficult to use when a question involves an object with distinct visual features. A photo-based QA system allows direct use of a photo to refer to the object. We develop a three-layer system architecture for photo-based QA that brings together recent technical achievements in question answering and image matching. The first, template-based QA layer matches a query photo to online images and extracts structured data from multimedia databases to answer questions about the photo. To simplify image matching, it exploits the question text to filter images based on categories and keywords. The second, information retrieval QA layer searches an internal repository of resolved photo-based questions to retrieve relevant answers. The third, human-computation QA layer leverages community experts to handle the most difficult cases. A series of experiments performed on a pilot dataset of 30,000 images of books, movie DVD covers, grocery items, and landmarks demonstrate the technical feasibility of this architecture. We present three prototypes to show how photo-based QA can be built into an online album, a text-based QA, and a mobile application.

Explore More