Craig A. Knoblock
University of Southern California
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Craig A. Knoblock.
International Journal of Cooperative Information Systems | 1993
Yigal Arens; Chin Y. Chee; Chun-Nan Hsu; Craig A. Knoblock
With the current explosion of data, retrieving and integrating information from various sources is a critical problem. Work in multidatabase systems has begun to address this problem, but it has primarily focused on methods for communicating between databases and requires significant effort for each new database added to the system. This paper describes a more general approach that exploits a semantic model of a problem domain to integrate the information from various information sources. The information sources handled include both databases and knowledge bases, and other information sources (e.g. programs) could potentially be incorporated into the system. This paper describes how both the domain and the information sources are modeled, shows how a query at the domain level is mapped into a set of queries to individual information sources, and presents algorithms for automatically improving the efficiency of queries using knowledge about both the domain and the information sources. This work is implemented in a system called SIMS and has been tested in a transportation planning domain using nine Oracle databases and a Loom knowledge base.
adaptive agents and multi-agents systems | 1999
Ion Muslea; Steven Minton; Craig A. Knoblock
With the tremendous amount of information that becomes available on the Web on a daily basis, the ability to quickly develop information agents has become a crucial problem. A vital component of any Web-based information agent is a set of wrappers that can extract the relevant data from semistructured information sources. Our novel approach to wrapper induction is based on the idea of hierarchical information extraction, which turns the hard problem of extracting data from an arbitrarily complex document into a series of easier extraction tasks. We introduce an inductive algorithm, STALKER, that generates high accuracy extraction rules based on user-labeled training examples. Labeling the training data represents the major bottleneck in using wrapper induction techniques, and our experimental results show that STALKER does significantly better then other approaches; on one hand, STALKER requires up to two orders of magnitude fewer examples than other algorithms, while on the other hand it can handle information sources that could not be wrapped by existing techniques.
intelligent information systems | 1996
Yigal Arens; Craig A. Knoblock; Wei-Min Shen
The standard approach to integrating heterogeneous information sources is to build a global schema that relates all of the information in the different sources, and to pose queries directly against it. The problem is that schema integration is usually difficult, and as soon as any of the information sources change or a new source is added, the process may have to be repeated.The SIMS system uses an alternative approach. A domain model of the application domain is created, establishing a fixed vocabulary for describing data sets in the domain. Using this language, each available information source is described. Queries to SIMS against the collection of available information sources are posed using terms from the domain model, and reformulation operators are employed to dynamically select an appropriate set of information sources and to determine how to integrate the available information to satisfy a query. This approach results in a system that is more flexible than existing ones, more easily scalable, and able to respond dynamically to newly available or unexpectedly missing information sources.This paper describes the query reformulation process in SIMS and the operators used in it. We provide precise definitions of the reformulation operators and explain the rationale behind choosing the specific ones SIMS uses. We have demonstrated the feasibility and effectiveness of this approach by applying SIMS in the domains of transportation planning and medical trauma care.
Autonomous Agents and Multi-Agent Systems | 2001
Ion Muslea; Steven Minton; Craig A. Knoblock
With the tremendous amount of information that becomes available on the Web on a daily basis, the ability to quickly develop information agents has become a crucial problem. A vital component of any Web-based information agent is a set of wrappers that can extract the relevant data from semistructured information sources. Our novel approach to wrapper induction is based on the idea of hierarchical information extraction, which turns the hard problem of extracting data from an arbitrarily complex document into a series of simpler extraction tasks. We introduce an inductive algorithm, STALKER, that generates high accuracy extraction rules based on user-labeled training examples. Labeling the training data represents the major bottleneck in using wrapper induction techniques, and our experimental results show that STALKER requires up to two orders of magnitude fewer examples than other algorithms. Furthermore, STALKER can wrap information sources that could not be wrapped by existing inductive techniques.
international conference on management of data | 1997
Naveen Ashish; Craig A. Knoblock
With the current explosion of information on the World Wide Web (WWW) a wealth of information on many different subjects has become available on-line. Numerous sources contain information that can be classified as semi-structured. At present, however, the only way to access the information is by browsing individual pages. We cannot query web documents in a database-like fashion based on their underlying structure. However, we can provide database-like querying for semi-structured WWW sources by building wrappers around these sources. We present an approach for semi-automatically generating such wrappers. The key idea is to exploit the formatting information in pages from the source to hypothesize the underlying structure of a page. From this structure the system generates a wrapper that facilitates querying of a source and possibly integrating it with other sources. We demonstrate the ease with which we are able to build wrappers for a number of internet sources in different domains using our implemented wrapper generation toolkit.
Artificial Intelligence | 1994
Craig A. Knoblock
Abstract This article presents a completely automated approach to generating abstractions for planning. The abstractions are generated using a tractable, domain-independent algorithm whose only input is the definition of a problem to be solved and whose output is an abstraction hierarchy that is tailored to the particular problem. The algorithm generates abstraction hierarchies by dropping literals from the original problem definition. It forms abstractions that satisfy the ordered monotonicity property, which guarantees that the structure of an abstract solution is not changed in the process of refining it. The algorithm for generating abstractions is implemented in a system called ALPINE, which generates abstractions for a hierarchical version of the PRODIGY problem solver. The abstractions generated by ALPINE are tested in multiple domains on large problem sets and are shown to produce shorter solutions with significantly less search than planning without using abstraction.
ACM Computing Surveys | 2014
Yao-Yi Chiang; Stefan Leyk; Craig A. Knoblock
Maps depict natural and human-induced changes on earth at a fine resolution for large areas and over long periods of time. In addition, maps—especially historical maps—are often the only information source about the earth as surveyed using geodetic techniques. In order to preserve these unique documents, increasing numbers of digital map archives have been established, driven by advances in software and hardware technologies. Since the early 1980s, researchers from a variety of disciplines, including computer science and geography, have been working on computational methods for the extraction and recognition of geographic features from archived images of maps (digital map processing). The typical result from map processing is geographic information that can be used in spatial and spatiotemporal analyses in a Geographic Information System environment, which benefits numerous research fields in the spatial, social, environmental, and health sciences. However, map processing literature is spread across a broad range of disciplines in which maps are included as a special type of image. This article presents an overview of existing map processing techniques, with the goal of bringing together the past and current research efforts in this interdisciplinary field, to characterize the advances that have been made, and to identify future research directions and opportunities.
Artificial Intelligence | 1989
Steven Minton; Jaime G. Carbonell; Craig A. Knoblock; Daniel R. Kuokka; Oren Etzioni; Yolanda Gil
Abstract This article outlines explanation-based learning (EBL) and its role in improving problem solving performance through experience. Unlike inductive systems, which learn by abstracting common properties from multiple examples, EBL systems explain why a particular example is an instance of a concept. The explanations are then converted into operational recognition rules. In essence, the EBL approach is analytical and knowledge-intensive, whereas inductive methods are empirical and knowledge-poor. This article focuses on extensions of the basic EBL method and their integration with the prodigy problem solving system. prodigy s EBL method is specifically designed to acquire search control rules that are effective in reducing total search time for complex task domains. Domain-specific search control rules are learned from successful problem solving decisions, costly failures, and unforeseen goal interactions. The ability to specify multiple learning strategies in a declarative manner enables EBL to serve as a general technique for performance improvement. prodigy s EBL method is analyzed, illustrated with several examples and performance results, and compared with other methods for integrating EBL and problem solving.
Intelligence\/sigart Bulletin | 1991
Jaime G. Carbonell; Oren Etzioni; Yolanda Gil; Robert Joseph; Craig A. Knoblock; Steven Minton; Manuela M. Veloso
Artificial intelligence has progressed to the point where multiple cognitive capabilities are being integrated into computational architectures, such as SOAR, PRODIGY, THEO, and ICARUS. This paper reports on the PRODIGY architecture, describing its planning and problem solving capabilities and touching upon its multiple learning methods. Learning in PRODIGY occurs at all decision points and integration in PRODIGY is at the knowledge level; the learning and reasoning modules produce mutually interpretable knowledge structures. Issues in architectural design are discussed, providing a context to examine the underlying tenets of the PRODIGY architecture.
cooperative information systems | 1997
Naveen Ashish; Craig A. Knoblock
To simplify the task of obtaining information from the vast number of information sources that are available on the World Wide Web (WWW), the authors are building information mediators for extracting and integrating data from multiple Web sources. In a mediator based approach, wrappers are built around individual information sources to translate between the mediator query language and the individual sources. They present an approach for semi-automatically generating wrappers for structured Internet sources. The key idea is to exploit formatting information in Web pages to hypothesize the underlying structure of a page. From this structure the system generates a wrapper that facilitates querying of a source and possibly integrating it with other sources. They demonstrate the ease with which they are able to build wrappers for a number of Web sources using their implemented wrapper generation toolkit.