Daniel S. Weld | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniel S. Weld is active.

Explore More

Publication

Featured researches published by Daniel S. Weld.

Communications of The ACM | 2008

Open information extraction from the web

Oren Etzioni; Michele Banko; Stephen Soderland; Daniel S. Weld

Traditionally, Information Extraction (IE) has focused on satisfying precise, narrow, pre-specified requests from small homogeneous corpora (e.g., extract the location and time of seminars from a set of announcements). Shifting to a new domain requires the user to name the target relations and to manually create new extraction rules or hand-tag new training examples. This manual labor scales linearly with the number of target relations. This paper introduces Open IE (OIE), a new extraction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tuples without requiring any human input. The paper also introduces TEXTRUNNER, a fully implemented, highly scalable OIE system where the tuples are assigned a probability and indexed to support efficient extraction and exploration via user queries. We report on experiments over a 9,000,000 Web page corpus that compare TEXTRUNNER with KNOWITALL, a state-of-the-art Web IE system. TEXTRUNNER achieves an error reduction of 33% on a comparable set of extractions. Furthermore, in the amount of time it takes KNOWITALL to perform extraction for a handful of pre-specified relations, TEXTRUNNER extracts a far broader set of facts reflecting orders of magnitude more relations, discovered on the fly. We report statistics on TEXTRUNNER’s 11,000,000 highest probability tuples, and show that they contain over 1,000,000 concrete facts and over 6,500,000more abstract assertions.

international world wide web conferences | 2004

Web-scale information extraction in knowitall: (preliminary results)

Oren Etzioni; Michael J. Cafarella; Doug Downey; Stanley Kok; Ana-Maria Popescu; Tal Shaked; Stephen Soderland; Daniel S. Weld; Alexander Yates

Manually querying search engines in order to accumulate a large bodyof factual information is a tedious, error-prone process of piecemealsearch. Search engines retrieve and rank potentially relevantdocuments for human perusal, but do not extract facts, assessconfidence, or fuse information from multiple documents. This paperintroduces KnowItAll, a system that aims to automate the tedious process ofextracting large collections of facts from the web in an autonomous,domain-independent, and scalable manner.The paper describes preliminary experiments in which an instance of KnowItAll, running for four days on a single machine, was able to automatically extract 54,753 facts. KnowItAll associates a probability with each fact enabling it to trade off precision and recall. The paper analyzes KnowItAlls architecture and reports on lessons learned for the design of large-scale information extraction systems.

Ai Magazine | 1994

An Introduction to Least Commitment Planning

Daniel S. Weld

Recent developments have clarified the process of generating partially ordered, partially specified sequences of actions whose execution will achieve an agents goal. This article summarizes a progression of least commitment planners, starting with one that handles the simple STRIPS representation and ending with UCPOP, a planner that manages actions with disjunctive precondition, conditional effects, and universal quantification over dynamic universes. Along the way, I explain how Chapmans formulation of the modal truth criterion is misleading and why his NP-completeness result for reasoning about plans with conditional effects does not apply to UCPOP.

Ai Magazine | 1999

Recent Advances in AI Planning

Daniel S. Weld

The past five years have seen dramatic advances in planning algorithms, with an emphasis on propositional methods such as GRAPHPLAN and compilers that convert planning problems into propositional conjunctive normal form formulas for solution using systematic or stochastic SAT methods. Related work, in the context of spacecraft control, advances our understanding of interleaved planning and execution. In this survey, I explain the latest techniques and suggest areas for future research.

international conference on management of data | 1999

An adaptive query execution system for data integration

Zachary G. Ives; Daniela Florescu; Marc Friedman; Alon Y. Levy; Daniel S. Weld

Query processing in data integration occurs over network-bound, autonomous data sources. This requires extensions to traditional optimization and execution techniques for three reasons: there is an absence of quality statistics about the data, data transfer rates are unpredictable and bursty, and slow or unavailable data sources can often be replaced by overlapping or mirrored sources. This paper presents the Tukwila data integration system, designed to support adaptivity at its core using a two-pronged approach. Interleaved planning and execution with partial optimization allows Tukwila to quickly recover from decisions based on inaccurate estimates. During execution, Tukwila uses adaptive query operators such as the double pipelined hash join, which produces answers quickly, and the dynamic collector, which robustly and efficiently computes unions across overlapping data sources. We demonstrate that the Tukwila architecture extends previous innovations in adaptive execution (such as query scrambling, mid-execution re-optimization, and choose nodes), and we present experimental evidence that our techniques result in behavior desirable for a data integration system.

international conference on management of data | 2001

Updating XML

Igor Tatarinov; Zachary G. Ives; Alon Y. Halevy; Daniel S. Weld

As XML has developed over the past few years, its role has expanded beyond its original domain as a semantics-preserving markup language for online documents, and it is now also the de facto format for interchanging data between heterogeneous systems. Data sources expert XML “views” over their data, and other system can directly import or query these views. As a result, there has been great interest in languages and systems for expressing queries over XML data, whether the XML is stored in a repository or generated as a view over some other data storage format. Clearly, in order to fully evolve XML into a universal data representation and sharing format, we must allow users to specify updates to XML documents and must develop techniques to process them efficiently. Update capabilities are important not only for modifying XML documents, but also for propagating changes through XML view and for expressing and transmitting changes to documents. This paper begins by proposing a set of basic update operations for both ordered and unordered XML data. We next describe extensions to the proposed standard XML query language, XQuery, to incorporate the update operations. We then consider alternative methods for implementing update operations when the XML data is mapped into a relational database. Finally, we describe an experimental evaluation of the alternative techniques for implementing our extensions.

Artificial Intelligence | 1995

An algorithm for probabilistic planning

Nicholas Kushmerick; Steve Hanks; Daniel S. Weld

Abstract We define the probabilistic planning problem in terms of a probability distribution over initial world states, a boolean combination of propositions representing the goal, a probability threshold, and actions whose effects depend on the execution-time state of the world and on random chance. Adopting a probabilistic model complicates the definition of plan success: instead of demanding a plan that provably achieves the goal, we seek plans whose probability of success exceeds the threshold. In this paper, we present buridan , an implemented least-commitment planner that solves problems of this form. We prove that the algorithm is both sound and complete. We then explore buridans efficiency by contrasting four algorithms for plan evaluation, using a combination of analytic methods and empirical experiments. We also describe the interplay between generating plans and evaluating them, and discuss the role of search control in probabilistic planning.

intelligent user interfaces | 2004

SUPPLE: automatically generating user interfaces

Krzysztof Z. Gajos; Daniel S. Weld

In order to give people ubiquitous access to software applications, device controllers, and Internet services, it will be necessary to automatically adapt user interfaces to the computational devices at hand (eg, cell phones, PDAs, touch panels, etc.). While previous researchers have proposed solutions to this problem, each has limitations. This paper proposes a novel solution based on treating interface adaptation as an optimization problem. When asked to render an interface on a specific device, our supple system searches for the rendition that meets the devices constraints and minimizes the estimated effort for the users expected interface actions. We make several contributions: 1) precisely defining the interface rendition problem, 2) demonstrating how user traces can be used to customize interface rendering to particular users usage pattern, 3) presenting an efficient interface rendering algorithm, 4) performing experiments that demonstrate the utility of our approach.

international world wide web conferences | 2008

Automatically refining the wikipedia infobox ontology

Fei Wu; Daniel S. Weld

The combined efforts of human volunteers have recently extracted numerous facts from Wikipedia, storing them as machine-harvestable object-attribute-value triples in Wikipedia infoboxes. Machine learning systems, such as Kylin, use these infoboxes as training data, accurately extracting even more semantic knowledge from natural language text. But in order to realize the full power of this information, it must be situated in a cleanly-structured ontology. This paper introduces KOG, an autonomous system for refining Wikipedias infobox-class ontology towards this end. We cast the problem of ontology refinement as a machine learning problem and solve it using both SVMs and a more powerful joint-inference approach expressed in Markov Logic Networks. We present experiments demonstrating the superiority of the joint-inference approach and evaluating other aspects of our system. Using these techniques, we build a rich ontology, integrating Wikipedias infobox-class schemata with WordNet. We demonstrate how the resulting ontology may be used to enhance Wikipedia with improved query processing and other features.

ACM Transactions on Information Systems | 2001

Scaling question answering to the web

Cody C. T. Kwok; Oren Etzioni; Daniel S. Weld

The wealth of information on the web makes it an attractive resource for seeking quick answers to simple, factual questions such as &quote;who was the first American in space?&quote; or &quote;what is the second tallest mountain in the world?&quote; Yet todays most advanced web search services (e.g., Google and AskJeeves) make it surprisingly tedious to locate answers to such questions. In this paper, we extend question-answering techniques, first studied in the information retrieval literature, to the web and experimentally evaluate their performance.First we introduce Mulder, which we believe to be the first general-purpose, fully-automated question-answering system available on the web. Second, we describe Mulders architecture, which relies on multiple search-engine queries, natural-language parsing, and a novel voting procedure to yield reliable answers coupled with high recall. Finally, we compare Mulders performance to that of Google and AskJeeves on questions drawn from the TREC-8 question answering track. We find that Mulders recall is more than a factor of three higher than that of AskJeeves. In addition, we find that Google requires 6.6 times as much user effort to achieve the same level of recall as Mulder.

Explore More