Benjamin Rosenfeld | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Benjamin Rosenfeld is active.

Explore More

Publication

Featured researches published by Benjamin Rosenfeld.

Knowledge and Information Systems | 2006

TEG—a hybrid approach to information extraction

Ronen Feldman; Benjamin Rosenfeld; Moshe Fresko

This paper describes a hybrid statistical and knowledge-based information extraction model, able to extract entities and relations at the sentence level. The model attempts to retain and improve the high accuracy levels of knowledge-based systems while drastically reducing the amount of manual labour by relying on statistics drawn from a training corpus. The implementation of the model, called TEG (trainable extraction grammar), can be adapted to any IE domain by writing a suitable set of rules in a SCFG (stochastic context-free grammar)-based extraction language and training them using an annotated corpus. The system does not contain any purely linguistic components, such as PoS tagger or shallow parser, but allows to using external linguistic components if necessary. We demonstrate the performance of the system on several named entity extraction and relation extraction tasks. The experiments show that our hybrid approach outperforms both purely statistical and purely knowledge-based systems, while requiring orders of magnitude less manual rule writing and smaller amounts of training data. We also demonstrate the robustness of our system under conditions of poor training-data quality.

meeting of the association for computational linguistics | 2006

URES : an Unsupervised Web Relation Extraction System

Benjamin Rosenfeld; Ronen Feldman

Most information extraction systems either use hand written extraction patterns or use a machine learning algorithm that is trained on a manually annotated corpus. Both of these approaches require massive human effort and hence prevent information extraction from becoming more widely applicable. In this paper we present URES (Unsupervised Relation Extraction System), which extracts relations from the Web in a totally unsupervised way. It takes as input the descriptions of the target relations, which include the names of the predicates, the types of their attributes, and several seed instances of the relations. Then the system downloads from the Web a large collection of pages that are likely to contain instances of the target relations. From those pages, utilizing the known seed instances, the system learns the relation patterns, which are then used for extraction. We present several experiments in which we learn patterns and extract instances of a set of several common IE relations, comparing several pattern learning and filtering setups. We demonstrate that using simple noun phrase tagger is sufficient as a base for accurate patterns. However, having a named entity recognizer, which is able to recognize the types of the relation attributes significantly, enhances the extraction performance. We also compare our approach with KnowItAlls fixed generic patterns.

conference on information and knowledge management | 2004

TEG: a hybrid approach to information extraction

Benjamin Rosenfeld; Ronen Feldman; Moshe Fresko; Jonathan Schler; Yonatan Aumann

This paper describes a hybrid statistical and knowledge-based information extraction model, able to extract entities and relations at the sentence level. The model attempts to retain and improve the high accuracy levels of knowledge-based systems while drastically reducing the amount of manual labor by relying on statistics drawn from a training corpus. The implementation of the model, called TEG (Trainable Extraction Grammar), can be adapted to any IE domain by writing a suitable set of rules in a SCFG (Stochastic Context Free Grammar) based extraction language, and training them using an annotated corpus. The system does not contain any purely linguistic components, such as PoS tagger or parser. We demonstrate the performance of the system on several named entity extraction and relation extraction tasks. The experiments show that our hybrid approach outperforms both purely statistical and purely knowledge-based systems, while requiring orders of magnitude less manual rule writing and smaller amount of training data. The improvement in accuracy is slight for named entity extraction task and more pronounced for relation extraction.

Knowledge and Information Systems | 2006

Visual information extraction

Yonatan Aumann; Ronen Feldman; Yair Liberzon; Benjamin Rosenfeld; Jonathan Schler

Typographic and visual information is an integral part of textual documents. Most information extraction (IE) systems ignore most of this visual information, processing the text as a linear sequence of words. Thus, much valuable information is lost. In this paper, we show how to make use of this visual information for IE. We present an algorithm that allows to automatically extract specific fields of the document (such as the title, author, etc.) based exclusively on the visual formatting of the document, without any reference to the semantic content. The algorithm employs a machine learning approach, whereby the system is first provided with a set of training documents in which the target fields are manually tagged and automatically learns how to extract these fields in future documents. We implemented the algorithm in a system for automatic analysis of documents in PDF format. We present experimental results of applying the system on a set of financial documents, extracting nine different target fields. Overall, the system achieved a 90% accuracy.

conference on information and knowledge management | 2001

A domain independent environment for creating information extraction modules

Ronen Feldman; Yonatan Aumann; Yair Liberzon; Kfir Ankori; Jonathan Schler; Benjamin Rosenfeld

Text-Mining is a growing area of interest within the field of Data Mining and Knowledge Discovery. Given a collection of text documents, most approaches to Text Mining perform knowledge-discovery operations either on external tags associated with each document, or on the set of all words within each document. Both approaches suffer from limitations. This paper focuses on an intermediate approach, one that we call text mining via information extraction, in which knowledge discovery takes place on focused, relevant terms, phrases and facts, as extracted from the documents.

empirical methods in natural language processing | 2006

Boosting Unsupervised Relation Extraction by Using NER

Ronen Feldman; Benjamin Rosenfeld

Web extraction systems attempt to use the immense amount of unlabeled text in the Web in order to create large lists of entities and relations. Unlike traditional IE methods, the Web extraction systems do not label every mention of the target entity or relation, instead focusing on extracting as many different instances as possible while keeping the precision of the resulting list reasonably high. URES is a Web relation extraction system that learns powerful extraction patterns from unlabeled text, using short descriptions of the target relations and their attributes. The performance of URES is further enhanced by classifying its output instances using the properties of the extracted patterns. The features we use for classification and the trained classification model are independent from the target relation, which we demonstrate in a series of experiments. In this paper we show how the introduction of a simple rule based NER can boost the performance of URES on a variety of relations. We also compare the performance of URES to the performance of the state-of-the-art KnowItAll system, and to the performance of its pattern learning component, which uses a simpler and less powerful pattern language than URES.

international world wide web conferences | 2005

Hybrid semantic tagging for information extraction

Ronen Feldman; Benjamin Rosenfeld; Moshe Fresko; Brian D. Davison

The semantic web is expected to have an impact at least as big as that of the existing HTML based web, if not greater. However, the challenge lays in creating this semantic web and in converting existing web information into the semantic paradigm. One of the core technologies that can help in migration process is automatic markup, the semantic markup of content, providing the semantic tags to describe the raw content. This paper describes a hybrid statistical and knowledge-based information extraction model, able to extract entities and relations at the sentence level. The model attempts to retain and improve the high accuracy levels of knowledge-based systems while drastically reducing the amount of manual labor by relying on statistics drawn from a training corpus. The implementation of the model, called TEG (Trainable Extraction Grammar), can be adapted to any IE domain by writing a suitable set of rules in a SCFG (Stochastic Context Free Grammar) based extraction language, and training them using an annotated corpus. The experiments show that our hybrid approach outperforms both purely statistical and purely knowledge-based systems, while requiring orders of magnitude less manual rule writing and smaller amount of training data. We also demonstrate the robustness of our system under conditions of poor training data quality. This makes the system very suitable for converting legacy web pages to semantic web pages.

european conference on machine learning | 2005

A systematic comparison of feature-rich probabilistic classifiers for NER tasks

Benjamin Rosenfeld; Moshe Fresko; Ronen Feldman

In the CoNLL 2003 NER shared task, more than two thirds of the submitted systems used the feature-rich representation of the task. Most of them used maximum entropy to combine the features together. Others used linear classifiers, such as SVM and RRM. Among all systems presented there, one of the MEMM-based classifiers took the second place, losing only to a committee of four different classifiers, one of which was ME-based and another RRM-based. The lone RRM was fourth, and CRF came in the middle of the pack. In this paper we shall demonstrate, by running the three algorithms upon the same tasks under exactly the same conditions that this ranking is due to feature selection and other causes and not due to the inherent qualities of the algorithms, which should be ranked otherwise.

international conference on data mining | 2015

Exploiting the Focus of the Document for Enhanced Entities' Sentiment Relevance Detection

Zvi Ben-Ami; Ronen Feldman; Benjamin Rosenfeld

A key question in sentiment analysis is whether sentiment ex-pressions, in a given text, are related to particular entities. This is an imperative question, since people are typically interested in sentiments on specific entities and not in the overall sentiment articulated in an article or a document. Sentiment relevance is aimed at addressing this precise problem. In this paper, we argue that exploiting information about the focus of the document on the entity of interest can significantly improve the task of detecting sentiment relevance and, hence, the final sentiment scores assigned for the entities. In order to assess the value of such information, we look at various methods for detecting sentiment relevance for entities. We consider both rule-based algorithms that rely on the entitys physical or syntactic proximity to the sentiment expressions as well as more sophisticated machine learning classification algorithms. We demonstrate that the focus of the document on the entities within it is, indeed, an important piece of information, which can be accurately learned with super-vised classification means. We, further, found that overall classification-based algorithms perform better than the deterministic ones in identifying sentiment relevance, with sequence-classification performing significantly better than direct classification.

conference on information and knowledge management | 2007