Binyamin Rosenfeld
Bar-Ilan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Binyamin Rosenfeld.
conference on information and knowledge management | 2002
Binyamin Rosenfeld; Ronen Feldman; Yonatan Aumann
Most information extraction systems focus on the textual content of the documents. They treat documents as sequences or of words, disregarding the physical and typographical layout of the information.. While this strategy helps in focusing the extraction process on the key semantic content of the document, much valuable information can also be derived form the document physical appearance. Often, fonts, physical positioning and other graphical characteristics are used to provide additional context to the information. This information is lost with pure-text analysis. In this paper we describe a general procedure for structural extraction, which allows for automatic extraction of entities from the document based on their visual characteristics and relative position in the document layout. Our structural extraction procedure is a learning algorithm, which knows how to automatically generalizes from examples. The procedure is a general one, applicable to any document format with visual and typographical information. We also then describe a specific implementation of the procedure to PDF documents, called PES (PDF Extraction System). PES works with PDF documents and is able to extract such fields such as Author(s), Title, Date, etc. with very high accuracy.
knowledge discovery and data mining | 2000
Ronen Feldman; Yair Liberzon; Binyamin Rosenfeld; Jonathan Schler; Jonathan Stoppi
Information extraction is one of the most important techniques used in Text Mining. One of the main problems in building information extraction (IE) systems is that the knowledge elicited from domain experts tends to be only approximately correct. In addition, the knowledge acquisition phase for building IE rules usually takes a tremendous amount of time on the part of the expert and of the linguist creating the rules. We therefore need an effective means of revising our IE rules whenever we discover such an inaccuracy. The IE revision problem is how best to go about revising a deficient IE rules using information contained in examples that expose inaccuracies. The revision process is very sensitive to implicit and explicit biases encoded in the specific revision algorithm employed. In a sense, each revision algorithm must provide two forms of biases: bias as to the place of the revision and bias as to the type of the revision that should be performed. In this paper we present a framework for writing approximate IE rules that are provided with explicit bias. The proposed framework can be used by many existing revision algorithms. The purpose of the revision bias framework is to allow the user to declare his own bias in a simple and structured way, i.e. to express the conditions placed on the domain knowledge for a given revision operator to be applied. This language extends and generalizes the work reported in [Feldman et. al. 1993]. It attacks the problem of writing IE rules from a novel perspective, one which enables a much faster development of IE systems.
knowledge discovery and data mining | 2015
Ronen Feldman; Oded Netzer; Aviv Peretz; Binyamin Rosenfeld
We present an end-to-end text mining methodology for relation extraction of adverse drug reactions (ADRs) from medical forums on the Web. Our methodology is novel in that it combines three major characteristics: (i) an underlying concept of using a head-driven phrase structure grammar (HPSG) based parser; (ii) domain-specific relation patterns, the acquisition of which is done primarily using unsupervised methods applied to a large, unlabeled text corpus; and (iii) automated post-processing algorithms for enhancing the set of extracted relations. We empirically demonstrate the ability of our proposed approach to predict ADRs prior to their reporting by the Food and Drug Administration (FDA). Put differently, we put our approach to a predictive test by demonstrating that our methodology can credibly point to ADRs that were not uncovered in clinical trials for evaluating new drugs that come to market but were only reported later on by the FDA as a label change.
conference on information and knowledge management | 2005
Moshe Fresko; Binyamin Rosenfeld; Ronen Feldman
This paper describes a framework for defining domain specific Feature Functions in a user friendly form to be used in a Maximum Entropy Markov Model (MEMM) for the Named Entity Recognition (NER) task. Our system called MERGE allows defining general Feature Function Templates, as well as Linguistic Rules incorporated into the classifier. The simple way of translating these rules into specific feature functions are shown. We show that MERGE can perform better from both purely machine learning based systems and purely-knowledge based approaches by some small expert interaction of rule-tuning.
meeting of the association for computational linguistics | 2014
Zvi Ben-Ami; Ronen Feldman; Binyamin Rosenfeld
Sentiment relevance detection problems occur when there is a sentiment expression in a text, and there is the question of whether or not the expression is related to a given entity or, more generally, to a given situation. The paper discusses variants of the problem, and shows that it is distinct from other somewhat similar problems occurring in the field of sentiment analysis and opinion mining. We experimentally demonstrate that using the information about relevancy significantly affects the final sentiment evaluation of the entities. We then compare a set of different algorithms for solving the relevance detection problem. The most accurate results are achieved by algorithms that use certain document-level information about the target entities. We show that this information can be accurately extracted using supervised classification methods.
conference on information and knowledge management | 2008
Binyamin Rosenfeld; Ronen Feldman; Lyle H. Ungar
Web pages often contain text that is irrelevant to their main content, such as advertisements, generic format elements, and references to other pages on the same site. When used by automatic content-processing systems, e.g., for Web indexing, text classification, or information extraction, this irrelevant text often produces substantial amount of noise. This paper describes a trainable filtering system based on a feature-rich sequence classifier that removes irrelevant parts from pages, while keeping the content intact. Most of the features the system uses are purely form-related: HTML tags and their positions, sizes of elements, etc. This keeps the system general and domain-independent. We also experiment with content words and show that while they perform very poorly alone, they can slightly improve the performance of pure-form features, without jeopardizing the domain-independence. Our system achieves very high accuracy (95% and above) on several collections of Web pages. We also do a series of tests with different features and different classifiers, comparing the contribution of different components to the system performance, and comparing two known sequence classifiers, Robust Risk Minimization (RRM) and Conditional Random Fields (CRF), in a novel setting.
siam international conference on data mining | 2006
Binyamin Rosenfeld; Ronen Feldman; Moshe Fresko
Computación Y Sistemas | 2014
Zvi Ben-Ami; Ronen Feldman; Binyamin Rosenfeld
intelligent data analysis | 2006
Ronen Feldman; Binyamin Rosenfeld; Ronen Lazar; Joshua Livnat; Benjamin Segal
ISAIM | 2006
Moshe Fresko; Binyamin Rosenfeld; Ronen Feldman