Michael Shilman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael Shilman is active.

Explore More

Publication

Featured researches published by Michael Shilman.

sketch based interfaces and modeling | 2004

Spatial recognition and grouping of text and graphics

Michael Shilman; Paul A. Viola

We present a framework for simultaneous grouping and recognition of shapes and symbols in free-form ink diagrams. The approach is completely spatial, that is it does not require any ordering on the strokes. It also does not place any constraint on the relative placement of the shapes or symbols. Initially each of the strokes on the page is linked in a proximity graph. A discriminative classifier is used to classify connected subgraphs as either making up one of the known symbols or perhaps as an invalid combination of strokes (e.g. including strokes from two different symbols). This classifier combines the rendered image of the strokes with stroke features such as curvature and endpoints. A small subset of very efficient features is selected, yielding an extremely fast classifier. An A-star search algorithm over connected subsets of the proximity graph is used to simultaneously find the optimal segmentation and recognition of all the strokes on the page. Experiments demonstrate that the system can achieve 97% segmentation/recognition accuracy on a cross-validated shape dataset from 19 different writers.

international conference on document analysis and recognition | 2003

Discerning structure from freeform handwritten notes

Michael Shilman; Zile Wei; Sashi Raghupathy; Patrice Y. Simard; David Jones

This paper presents an integrated approach to parsing textual structure in freeform handwritten notes. Text-graphics classification and text layout analysis are classical problems in printed document analysis, but the irregularity in handwriting and content in freeform notes reveals limitations in existing approaches. We advocate an integrated technique that solves the layout analysis and classification problems simultaneously: the problems are so tightly coupled that it is not possible to solve one without the other for real user notes. We tune and evaluate our approach on a large corpus of unscripted user files and reflect on the difficult recognition scenarios that we have encountered in practice.

human factors in computing systems | 2007

InkSeine: In Situ search for active note taking

Ken Hinckley; Shengdong Zhao; Raman K. Sarin; Patrick Baudisch; Edward Cutrell; Michael Shilman; Desney S. Tan

Using a notebook to sketch designs, reflect on a topic, or capture and extend creative ideas are examples of active note taking tasks. Optimal experience for such tasks demands concentration without interruption. Yet active note taking may also require reference documents or emails from team members. InkSeine is a Tablet PC application that supports active note taking by coupling a pen-and-ink interface with an in situ search facility that flows directly from a users ink notes (Fig. 1). InkSeine integrates four key concepts: it leverages preexisting ink to initiate a search; it provides tight coupling of search queries with application content; it persists search queries as first class objects that can be commingled with ink notes; and it enables a quick and flexible workflow where the user may freely interleave inking, searching, and gathering content. InkSeine offers these capabilities in an interface that is tailored to the unique demands of pen input, and that maintains the primacy of inking above all other tasks.

user interface software and technology | 2007

SketchWizard: Wizard of Oz prototyping of pen-based user interfaces

Richard C. Davis; T. Scott Saponas; Michael Shilman; James A. Landay

SketchWizard allows designers to create Wizard of Oz prototypes of pen-based user interfaces in the early stages of design. In the past, designers have been inhibited from participating in the design of pen-based interfaces because of the inadequacy of paper prototypes and the difficulty of developing functional prototypes. In SketchWizard, designers and end users share a drawing canvas between two computers, allowing the designer to simulate the behavior of recognition or other technologies. Special editing features are provided to help designers respond quickly to end-user input. This paper describes the SketchWizard system and presents two evaluations of our approach. The first is an early feasibility study in which Wizard of Oz was used to prototype a pen-based user interface. The second is a laboratory study in which designers used SketchWizard to simulate existing pen-based interfaces. Both showed that end users gave valuable feedback in spite of delays between end-user actions and wizard updates.

user interface software and technology | 2006

CueTIP: a mixed-initiative interface for correcting handwriting errors

Michael Shilman; Desney S. Tan; Patrice Y. Simard

With advances in pen-based computing devices, handwriting has become an increasingly popular input modality. Researchers have put considerable effort into building intelligent recognition systems that can translate handwriting to text with increasing accuracy. However, handwritten input is inherently ambiguous, and these systems will always make errors. Unfortunately, work on error recovery mechanisms has mainly focused on interface innovations that allow users to manually transform the erroneous recognition result into the intended one. In our work, we propose a mixed-initiative approach to error correction. We describe CueTIP, a novel correction interface that takes advantage of the recognizer to continually evolve its results using the additional information from user corrections. This significantly reduces the number of actions required to reach the intended result. We present a user study showing that CueTIP is more efficient and better preferred for correcting handwriting recognition errors. Grounded in the discussion of CueTIP, we also present design principles that may be applied to mixed-initiative correction interfaces in other domains.

intelligent user interfaces | 2004

Robust sketched symbol fragmentation using templates

Heloise Hwawen Hse; Michael Shilman; A. Richard Newton

Analysis of sketched digital ink is often aided by the division of stroke points into perceptually-salient fragments based on geometric features. Fragmentation has many applications in intelligent interfaces for digital ink capture and manipulation, as well as higher-level symbolic and structural analyses. It is our intuitive belief that the most robust fragmentations closely match a users natural perception of the ink, thus leading to more effective recognition and useful user feedback. We present two optimal fragmentation algorithms that fragment common geometries into a basis set of line segments and elliptical arcs. The first algorithm uses an explicit template in which the order and types of bases are specified. The other only requires the number of fragments of each basis type. For the set of symbols under test, both algorithms achieved 100% fragmentation accuracy rate for symbols with line bases, ›99% accuracy for symbols with elliptical bases, and ›90% accuracy for symbols with mixed line and elliptical bases.

international conference on frontiers in handwriting recognition | 2004

Recognition and grouping of handwritten text in diagrams and equations

Michael Shilman; Paul A. Viola; Kumar Chellapilla

We present a framework for grouping and recognition of characters and symbols in online free-form ink expressions. The approach is completely spatial; it does not require any ordering on the strokes. It also does not place any constraints on the layout of the symbols. Initially each of the strokes on the page is linked in a proximity graph. A discriminative recognizer is used to classify connected subgraphs as either making up one of the known symbols or perhaps as an invalid combination of strokes (e.g. including strokes from two different symbols). This recognizer operates on the rendered image of the strokes plus stroke features such as curvature and endpoints. A small subset of very efficient image features is selected, yielding an extremely fast recognizer. Dynamic programming over connected subsets of the proximity graph is used to simultaneously find the optimal grouping and recognition of all the strokes on the page. Experiments demonstrate that the system can achieve 94% grouping/recognition accuracy on a test dataset containing symbols from 25 writers held out from the training process.

international conference on computer vision | 2005

Learning nongenerative grammatical models for document analysis

Michael Shilman; Percy Liang; Paul A. Viola

We present a general approach for the hierarchical segmentation and labeling of document layout structures. This approach models document layout as a grammar and performs a global search for the optimal parse based on a grammatical cost function. Our contribution is to utilize machine learning to discriminatively select features and set all parameters in the parsing process. Therefore, and unlike many other approaches for layout analysis, ours can easily adapt itself to a variety of document analysis problems. One need only specify the page grammar and provide a set of correctly labeled pages. We apply this technique to two document image analysis tasks: page layout structure extraction and mathematical expression interpretation. Experiments demonstrate that the learned grammars can be used to extract the document structure in 57 files from the UWIII document image database. We also show that the same framework can be used to automatically interpret printed mathematical expressions so as to recreate the original LaTeX

document analysis systems | 2006

Combining multiple classifiers for faster optical character recognition

Kumar Chellapilla; Michael Shilman; Patrice Y. Simard

Traditional approaches to combining classifiers attempt to improve classification accuracy at the cost of increased processing. They may be viewed as providing an accuracy-speed trade-off: higher accuracy for lower speed. In this paper we present a novel approach to combining multiple classifiers to solve the inverse problem of significantly improving classification speeds at the cost of slightly reduced classification accuracy. We propose a cascade architecture for combining classifiers and cast the process of building such a cascade as a search and optimization problem. We present two algorithms based on steepest-descent and dynamic programming for producing approximate solutions fast. We also present a simulated annealing algorithm and a depth-first-search algorithm for finding optimal solutions. Results on handwritten optical character recognition indicate that a) a speedup of 4-9 times is possible with no increase in error and b) speedups of up to 15 times are possible when twice as many errors can be tolerated.

international conference on document analysis and recognition | 2005

Efficient geometric algorithms for parsing in two dimensions

Percy Liang; Mukund Narasimhan; Michael Shilman; Paul A. Viola

Grammars are a powerful technique for modeling and extracting the structure of documents. One large challenge, however, is computational complexity. The computational cost of grammatical parsing is related to both the complexity of the input and the ambiguity of the grammar. For programming languages, where the terminals appear in a linear sequence and the grammar is unambiguous, parsing is O(N). For natural languages, which are linear yet have an ambiguous grammar, parsing is O(N/sup 3/). For documents, where the terminals are arranged in two dimensions and the grammar is ambiguous, parsing time can be exponential in the number of terminals. In this paper we introduce (and unify) several types of geometrical data structures which can be used to significantly accelerate parsing time. Each data structure embodies a different geometrical constraint on the set of possible valid parses. These data structures are very general, in that they can be used by any type of grammatical model, and a wide variety of document understanding tasks, to limit the set of hypotheses examined and tested. Assuming a clean design for the parsing software, the same parsing framework can be tested with various geometric constraints to determine the most effective combination.

Explore More