Moshe Fresko
Bar-Ilan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Moshe Fresko.
european conference on principles of data mining and knowledge discovery | 1998
Ronen Feldman; Moshe Fresko; Yakkov Kinar; Yehuda Lindell; Orly Liphstat; Martin Rajman; Yonatan Schler; Oren Zamir
Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form. Previous work in text mining focused at the word or the tag level. This paper presents an approach to performing text mining at the term level. The mining process starts by preprocessing the document collection and extracting terms from the documents. Each document is then represented by a set of terms and annotations characterizing the document. Terms and additional higher-level entities are then organized in a hierarchical taxonomy. In this paper we will describe the Term Extraction module of the Document Explorer system, and provide experimental evaluation performed on a set of 52,000 documents published by Reuters in the years 1995–1996.
Knowledge and Information Systems | 2006
Ronen Feldman; Benjamin Rosenfeld; Moshe Fresko
This paper describes a hybrid statistical and knowledge-based information extraction model, able to extract entities and relations at the sentence level. The model attempts to retain and improve the high accuracy levels of knowledge-based systems while drastically reducing the amount of manual labour by relying on statistics drawn from a training corpus. The implementation of the model, called TEG (trainable extraction grammar), can be adapted to any IE domain by writing a suitable set of rules in a SCFG (stochastic context-free grammar)-based extraction language and training them using an annotated corpus. The system does not contain any purely linguistic components, such as PoS tagger or shallow parser, but allows to using external linguistic components if necessary. We demonstrate the performance of the system on several named entity extraction and relation extraction tasks. The experiments show that our hybrid approach outperforms both purely statistical and purely knowledge-based systems, while requiring orders of magnitude less manual rule writing and smaller amounts of training data. We also demonstrate the robustness of our system under conditions of poor training-data quality.
international conference on data mining | 2007
Ronen Feldman; Moshe Fresko; Jacob Goldenberg; Oded Netzer; Lyle H. Ungar
In recent years, product discussion forums have become a rich environment in which consumers and potential adopters exchange views and information. Researchers and practitioners are starting to extract user sentiment about products from user product reviews. Users often compare different products, stating which they like better and why. Extracting information about product comparisons offers a number of challenges; recognizing and normalizing entities (products) in the informal language of blogs and discussion groups require different techniques than those used for entity extraction in the more formal text of newspapers and scientific articles. We present a case study in extracting information about comparisons between running shoes and between cars, describe an effective methodology, and show how it produces insight into how consumers view the running shoe and car markets.
intelligent information systems | 2005
Amihood Amir; Yonatan Aumann; Ronen Feldman; Moshe Fresko
We describe a new tool for mining association rules, which is of special value in text mining. The new tool, called maximal associations, is geared toward discovering associations that are frequently lost when using regular association rules. Intuitively, a maximal association rule
conference on information and knowledge management | 2004
Benjamin Rosenfeld; Ronen Feldman; Moshe Fresko; Jonathan Schler; Yonatan Aumann
european conference on principles of data mining and knowledge discovery | 1998
David Landau; Ronen Feldman; Yonatan Aumann; Moshe Fresko; Yehuda Lindell; Orly Liphstat; Oren Zamir
{X}\stackrel{\rm max}{\Longrightarrow}{Y}
international conference on service systems and service management | 2008
Ronen Feldman; Moshe Fresko; Jacob Goldenberg; Oded Netzer; Lyle H. Ungar
conference on information and knowledge management | 2005
Moshe Fresko; Binyamin Rosenfeld; Ronen Feldman
says that whenever X is the only item of its type in a transaction, than Y also appears, with some confidence. Maximal associations allow the discovery of associations pertaining to items that most often do not appear alone, but rather together with closely related items, and hence associations relevant only to these items tend to obtain low confidence. We provide a formal description of maximal association rules and efficient algorithms for discovering all such associations. We present the results of applying maximal association rules to two text corpora.
international world wide web conferences | 2005
Ronen Feldman; Benjamin Rosenfeld; Moshe Fresko; Brian D. Davison
This paper describes a hybrid statistical and knowledge-based information extraction model, able to extract entities and relations at the sentence level. The model attempts to retain and improve the high accuracy levels of knowledge-based systems while drastically reducing the amount of manual labor by relying on statistics drawn from a training corpus. The implementation of the model, called TEG (Trainable Extraction Grammar), can be adapted to any IE domain by writing a suitable set of rules in a SCFG (Stochastic Context Free Grammar) based extraction language, and training them using an annotated corpus. The system does not contain any purely linguistic components, such as PoS tagger or parser. We demonstrate the performance of the system on several named entity extraction and relation extraction tasks. The experiments show that our hybrid approach outperforms both purely statistical and purely knowledge-based systems, while requiring orders of magnitude less manual rule writing and smaller amount of training data. The improvement in accuracy is slight for named entity extraction task and more pronounced for relation extraction.
european conference on machine learning | 2005
Benjamin Rosenfeld; Moshe Fresko; Ronen Feldman
TextVis is a visual data mining system for document collections. Such a collection represents an application domain, and the primary goal of the system is to derive patterns that provide knowledge about this domain. Additionally, the derived patterns can be used to browse the collection. TextVis takes a multi-strategy approach to text mining, and enables defining complex analysis schemas from basic components, provided by the system. An analysis schema is constructed by dragging functional icons from a tool-pallette onto the workspace and connecting them according to the desired flow of information. The system provides a large collection of basic analysis tools, including: frequent sets, associations, concept distributions, and concept correlations. The discovered patterns are presented in a visual interface allowing the user to operate on the results, and to access the associated documents. TextVis is a complete text mining system which uses agent technology to access various online information sources, text preprocessing tools to extract relevant information from the documents, a variety of data mining algorithms, and a set of visual browsers to view the results. This paper provides an overview on the TextVis system. We describe the system’s architecture, the various tools, and discuss the advantages of our visual environment for mining large document collections.