Dayne Freitag | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dayne Freitag is active.

Explore More

Publication

Featured researches published by Dayne Freitag.

Artificial Intelligence | 2000

Learning to construct knowledge bases from the World Wide Web

Mark Craven; Dan DiPasquo; Dayne Freitag; Andrew McCallum; Tom M. Mitchell; Kamal Nigam; Seán Slattery

Abstract The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable knowledge base whose content mirrors that of the World Wide Web. Such a knowledge base would enable much more effective retrieval of Web information, and promote new uses of the Web to support knowledge-based inference and problem solving. Our approach is to develop a trainable information extraction system that takes two inputs. The first is an ontology that defines the classes (e.g., company , person , employee , product ) and relations (e.g., employed_by , produced_by ) of interest when creating the knowledge base. The second is a set of training data consisting of labeled regions of hypertext that represent instances of these classes and relations. Given these inputs, the system learns to extract information from other pages and hyperlinks on the Web. This article describes our general approach, several machine learning algorithms for this task, and promising initial results with a prototype system that has created a knowledge base describing university people, courses, and research projects.

Communications of The ACM | 1994

Experience with a learning personal assistant

Tom M. Mitchell; Rich Caruana; Dayne Freitag; John P. McDermott; David Zabowski

Personal software assistants that help users with tasks like finding information, scheduling calendars, or managing work-flow will require significant customization to each individual user. For example, an assistant that helps schedule a particular user’s calendar will have to know that user’s scheduling preferences. This paper explores the potential of machine learning methods to automatically create and maintain such customized knowledge for personal software assistants. We describe the design of one particular learning assistant: a calendar manager, called CAP (Calendar APprentice), that learns user scheduling preferences from experience. Results are summarized from approximately five user-years of experience, during which CAP has learned an evolving set of several thousand rules that characterize the scheduling preferences of its users. Based on this experience, we suggest that machine learning methods may play an important role in future personal software assistants.

Machine Learning | 2000

Machine Learning for Information Extraction in Informal Domains

Dayne Freitag

We consider the problem of learning to perform information extraction in domains where linguistic processing is problematic, such as Usenet posts, email, and finger plan files. In place of syntactic and semantic information, other sources of information can be used, such as term frequency, typography, formatting, and mark-up. We describe four learning approaches to this problem, each drawn from a different paradigm: a rote learner, a term-space learner based on Naive Bayes, an approach using grammatical induction, and a relational rule learner. Experiments on 14 information extraction problems defined over four diverse document collections demonstrate the effectiveness of these approaches. Finally, we describe a multistrategy approach which combines these learners and yields performance competitive with or better than the best of them. This technique is modular and flexible, and could find application in other machine learning problems.

international acm sigir conference on research and development in information retrieval | 2000

Bridging the lexical chasm: statistical approaches to answer-finding

Adam L. Berger; Rich Caruana; David A. Cohn; Dayne Freitag; Vibhu O. Mittal

This paper investigates whether a machine can automatically learn the task of finding, within a large collection of candidate responses, the answers to questions. The learning process consists of inspecting a collection of answered questions and characterizing the relation between question and answer with a statistical model. For the purpose of learning this relation, we propose two sources of data: Usenet FAQ documents and customer service call-center dialogues from a large retail company. We will show that the task of “answer-finding” differs from both document retrieval and tradition question-answering, presenting challenges different from those found in these problems. The central aim of this work is to discover, through theoretical and empirical investigation, those statistical techniques best suited to the answer-finding problem.

meeting of the association for computational linguistics | 1998

Toward General-Purpose Learning for Information Extraction

Dayne Freitag

Two trends are evident in the recent evolution of the field of information extraction: a preference for simple, often corpus-driven techniques over linguistically sophisticated ones; and a broadening of the central problem definition to include many non-traditional text domains. This development calls for information extraction systems which are as retargetable and general as possible. Here, we describe SRV, a learning architecture for information extraction which is designed for maximum generality and flexibility. SRV can exploit domain-specific information, including linguistic syntax and lexical information, in the form of features provided to the system explicitly as input for training. This process is illustrated using a domain created from Reuters corporate acquisitions articles. Features are derived from two general-purpose NLP systems, Sleator and Temperlys link grammar parser and Wordnet. Experiments compare the learners performance with and without such linguistic information. Surprisingly, in many cases, the system performs as well without this information as with it.

meeting of the association for computational linguistics | 2016

Feature Derivation for Exploitation of Distant Annotation via Pattern Induction against Dependency Parses.

Dayne Freitag; John Niekrasz

We consider the use of distant supervision for biological information extraction, and introduce two understudied corpora of this form, the Biological Expression Language (BEL) Large Corpus and the Pathway Logic (PL) Datum Corpus. Each resource eschews annotation at the sentence constituent level, and the PL corpus requires synthesis of information across multiple sentences to construct composite knowledge frames. Decomposing this problem into feature induction for slotlevel attributes, followed by event assembly over this space of features, we introduce a novel, general-purpose pattern induction procedure, evaluating it against these two corpora, demonstrating its ability to induce effective detection against dependency parses.

international conference on machine learning | 2000