Matthew Michelson | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matthew Michelson is active.

Explore More

Publication

Featured researches published by Matthew Michelson.

analytics for noisy unstructured text data | 2007

Unsupervised information extraction from unstructured, ungrammatical data sources on the World Wide Web

Matthew Michelson; Craig A. Knoblock

Information extraction from unstructured, ungrammatical data such as classified listings is difficult because traditional structural and grammatical extraction methods do not apply. Previous work has exploited reference sets to aid such extraction, but it did so using supervised machine learning. In this paper, we present an unsupervised approach that both selects the relevant reference set(s) automatically and then uses it for unsupervised extraction. We validate our approach with experimental results that show our unsupervised extraction is competitive with supervised machine learning approaches, including the previous supervised approach that exploits reference sets.

geographic information science | 2008

Identifying Maps on the World Wide Web

Matthew Michelson; Aman Goel; Craig A. Knoblock

This paper presents an automatic approach to mining collections of maps from the Web. Our method harvests images from the Web and then classifies them as maps or non-maps by comparing them to previously classified map and non-map images using methods from Content-Based Image Retrieval (CBIR). Our approach outperforms the accuracy of the previous approach by 20% in F 1 -measure. Further, our method is more scalable and less costly than previous approaches that rely on more traditional machine learning techniques.

Ai Magazine | 2008

Beyond the Elves: Making Intelligent Agents Intelligent

Craig A. Knoblock; José Luis Ambite; Mark James Carman; Matthew Michelson; Pedro A. Szekely; Rattapoom Tuchinda

The goal of the Electric Elves project was to develop software agent technology to support human organizations. We developed a variety of applications of the Elves, including scheduling visitors, man- aging a research group (the Office Elves), and monitoring travel (the Travel Elves). The Travel Elves were eventually deployed at DARPA, where things did not go exact- ly as planned. In this article, we describe some of the things that went wrong and then present some of the lessons learned and new research that arose from our experience in building the Travel Elves.

document recognition and retrieval | 2010

A general approach to discovering, registering,and extracting features from raster maps

Craig A. Knoblock; Ching-Chien Chen; Yao-Yi Chiang; Aman Goel; Matthew Michelson; Cyrus Shahabi

Maps can be a great source of information for a given geographic region, but they can be difficult to find and even harder to process. A significant problem is that many interesting and useful maps are only available in raster format, and even worse many maps have been poorly scanned and they are often compressed with lossy compression algorithms. Furthermore, for many of these maps there is no meta data providing the geographic coordinates, scale, or projection. Previous research on map processing has developed techniques that typically work on maps from a single map source. In contrast, we have developed a general approach to finding and processing street maps. This includes techniques for discovering maps online, extracting geographic and textual features from maps, using the extracted features to determine the geographic coordinates of the maps, and aligning the maps with imagery. The resulting system can find, register, and extract a variety of features from raster maps, which can then be used for various applications, such as annotating satellite imagery, creating and updating maps, or constructing detailed gazetteers.

International Journal on Document Analysis and Recognition | 2011

Harvesting maps on the web

Aman Goel; Matthew Michelson; Craig A. Knoblock

Maps are one of the most valuable documents for gathering geospatial information about a region. Yet, finding a collection of diverse, high-quality maps is a significant challenge because there is a dearth of content-specific metadata available to identify them from among other images on the Web. For this reason, it is desirous to analyze the content of each image. The problem is further complicated by the variations between different types of maps, such as street maps and contour maps, and also by the fact that many high-quality maps are embedded within other documents such as PDF reports. In this paper, we present an automatic method to find high-quality maps for a given geographic region. Not only does our method find documents that are maps, but also those that are embedded within other documents. We have developed a Content-Based Image Retrieval (CBIR) approach that uses a new set of features for classification in order to capture the defining characteristics of a map. This approach is able to identify all types of maps irrespective of their subject, scale, and color in a highly scalable and accurate way. Our classifier achieves an F1-measure of 74%, which is an 18% improvement over the previous work in the area.

international conference on machine learning and applications | 2011

Improving Classifier Performance by Autonomously Collecting Background Knowledge from the Web

Steven Minton; Matthew Michelson; Kane See; Sofus A. Macskassy; Bora Gazen; Lise Getoor

Many websites allow users to tag data items to make them easier to find. In this paper we consider the problem of classifying tagged data according to user-specified interests. We present an approach for aggregating background knowledge from the Web to improve the performance of a classier. In previous work, researchers have developed technology for extracting knowledge, in the form of relational tables, from semi-structured websites. In this paper we integrate this extraction technology with generic machine learning algorithms, showing that knowledge extracted from the Web can significantly benefit the learning process. Specifically, the knowledge can lead to better generalizations, reduce the number of samples required for supervised learning, and eliminate the need to retrain the system when the environment changes. We validate the approach with an application that classifies tagged Fickr data.

analytics for noisy unstructured text data | 2010