Shaul Markovitch
Technion – Israel Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shaul Markovitch.
Journal of Artificial Intelligence Research | 2009
Evgeniy Gabrilovich; Shaul Markovitch
Adequate representation of natural language semantics requires access to vast amounts of common sense and domain-specific world knowledge. Prior work in the field was based on purely statistical techniques that did not make use of background knowledge, on limited lexicographic knowledge bases such as WordNet, or on huge manual efforts such as the CYC project. Here we propose a novel method, called Explicit Semantic Analysis (ESA), for fine-grained semantic interpretation of unrestricted natural language texts. Our method represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence. We explicitly represent the meaning of any text in terms of Wikipedia-based concepts. We evaluate the effectiveness of our method on text categorization and on computing the degree of semantic relatedness between fragments of natural language text. Using ESA results in significant improvements over the previous state of the art in both tasks. Importantly, due to the use of natural concepts, the ESA model is easy to explain to human users.
international conference on machine learning | 2004
Evgeniy Gabrilovich; Shaul Markovitch
Text categorization algorithms usually represent documents as bags of words and consequently have to deal with huge numbers of features. Most previous studies found that the majority of these features are relevant for classification, and that the performance of text categorization with support vector machines peaks when no feature selection is performed. We describe a class of text categorization problems that are characterized with many redundant features. Even though most of these features are relevant, the underlying concepts can be concisely captured using only a few features, while keeping all of them has substantially detrimental effect on categorization accuracy. We develop a novel measure that captures feature redundancy, and use it to analyze a large collection of datasets. We show that for problems plagued with numerous redundant features the performance of C4.5 is significantly superior to that of SVM, while aggressive feature selection allows SVM to beat C4.5 by a narrow margin.
ACM Transactions on Information Systems | 2011
Ofer Egozi; Shaul Markovitch; Evgeniy Gabrilovich
Information retrieval systems traditionally rely on textual keywords to index and retrieve documents. Keyword-based retrieval may return inaccurate and incomplete results when different keywords are used to describe the same concept in the documents and in the queries. Furthermore, the relationship between these related keywords may be semantic rather than syntactic, and capturing it thus requires access to comprehensive human world knowledge. Concept-based retrieval methods have attempted to tackle these difficulties by using manually built thesauri, by relying on term cooccurrence data, or by extracting latent word relationships and concepts from a corpus. In this article we introduce a new concept-based retrieval approach based on Explicit Semantic Analysis (ESA), a recently proposed method that augments keyword-based text representation with concept-based features, automatically extracted from massive human knowledge repositories such as Wikipedia. Our approach generates new text features automatically, and we have found that high-quality feature selection becomes crucial in this setting to make the retrieval more focused. However, due to the lack of labeled data, traditional feature selection methods cannot be used, hence we propose new methods that use self-generated labeled training data. The resulting system is evaluated on several TREC datasets, showing superior performance over previous state-of-the-art results.
national conference on artificial intelligence | 1999
Michael Lindenbaum; Shaul Markovitch; Dmitry Rusakov
Most existing inductive learning algorithms work under the assumption that their training examples are already tagged. There are domains, however, where the tagging procedure requires significant computation resources or manual labor. In such cases, it may be beneficial for the learner to be active, intelligently selecting the examples for labeling with the goal of reducing the labeling cost. In this paper we present LSS—a lookahead algorithm for selective sampling of examples for nearest neighbor classifiers. The algorithm is looking for the example with the highest utility, taking its effect on the resulting classifier into account. Computing the expected utility of an example requires estimating the probability of its possible labels. We propose to use the random field model for this estimation. The LSS algorithm was evaluated empirically on seven real and artificial data sets, and its performance was compared to other selective sampling algorithms. The experiments show that the proposed algorithm outperforms other methods in terms of average error rate and stability.
Computer Speech & Language | 1995
Ido Dagan; Shaul Marcus; Shaul Markovitch
Abstract In recent years there is much interest in word co-occurrence relations, such as n-grams, verb–object combinations, or co-occurrence within a limited context. This paper discusses how to estimate the likelihood of co-occurrences that do not occur in the training data. We present a method that makes local analogies between each specific unobserved co-occurrence and other co-occurrences that contain similar words. These analogies are based on the assumption that similar word co-occurrences have similar values of mutual information. Accordingly, the word similarity metric captures similarities between vectors of mutual information values. Our evaluation suggests that this method performs better than existing, frequency-based, smoothing methods, and may provide an alternative to class-based models. A background survey is included, covering issues of lexical co-occurrence, data sparseness and smoothing, word similarity and clustering, and mutual information.
international conference on machine learning | 1988
Shaul Markovitch; Paul D. Scott
This paper is a discussion of the relationship between learning and forgetting. An analysis of the economics of learning is carried out and it is argued that knowledge can sometimes have a negative value. A series of experiments involving a program which learns to traverse state spaces is described. It is shown that most of the knowledge acquired is of negative value even though it is correct and was acquired solving similar problems. It is shown that the value of the knowledge depends on what else is known and that random forgetting can sometimes lead to substantial improvements in performance. It is concluded that research into knowledge acquisition should take seriously the possibility that knowledge may sometimes be harmful. The view is taken that learning and forgetting are complementary processes which construct and maintain useful representations of experience. Research on machine learning is concerned with the problem of how a system may acquire knowledge that it does not possess. It is therefore not surprising that relatively little attention has been paid to the converse problem: How may a system dispose of knowledge it already possess? This is the phenomenon that is termed forgetting when it occurs in humans, and is usually regarded as an unfortunate failure of the memory system. It is our contention that this negative view of forgetting is misplaced and that far from being a shortcoming it is a very useful process which facilitates effective knowledge acquisition. Learning is a process in which an organized representation of experience is constructed (Scott 1983). Forgetting is a process in which parts of that organized representation are rearranged or dismantled. The two processes are thus complementary and the resulting representation is the joint product of both. Mechanisms of forgetting therefore merit study alongside those of acquisition since it is the two together which constitute learning. Our notion of forgetting is fairly broad. In addition to the obvious mechanism of deletion of items of knowledge it also includes changes in the knowledge structure which render particular items relatively or completely inaccessible. It thus includes processes which weaken memory traces or isolate fragments of a knowledge base. Such changes can be viewed as partial removals with deletion as a limiting case which produces complete removal. In this paper we attempt to explore the role of forgetting in machine learning systems. We begin by discussing the circumstances in which it is better to dispose of an item of knowledge than retain it. Then we describe some experimental work we have done in order to demonstrate that even correct knowledge acquired in the course of solving similar problems can be a disadvantage to a system. 2. The Economics of Learning
meeting of the association for computational linguistics | 1993
Ido Dagan; Shaul Marcus; Shaul Markovitch
In recent years there is much interest in word cooccurrence relations, such as n-grams, verb-object combinations, or cooccurrence within a limited context. This paper discusses how to estimate the probability of cooccurrences that do not occur in the training data. We present a method that makes local analogies between each specific unobserved cooccurrence and other cooccurrences that contain similar words, as determined by an appropriate word similarity metric. Our evaluation suggests that this method performs better than existing smoothing methods, and may provide an alternative to class based models.
international conference on computer vision | 2012
Tamar Avraham; Ilya Gurvich; Michael Lindenbaum; Shaul Markovitch
This paper proposes a novel approach for pedestrian re-identification. Previous re-identification methods use one of 3 approaches: invariant features; designing metrics that aim to bring instances of shared identities close to one another and instances of different identities far from one another; or learning a transformation from the appearance in one domain to the other. Our implicit approach models camera transfer by a binary relation R={(x,y)|x and y describe the same person seen from cameras A and B respectively}. This solution implies that the camera transfer function is a multi-valued mapping and not a single-valued transformation, and does not assume the existence of a metric with desirable properties. We present an algorithm that follows this approach and achieves new state-of-the-art performance.
Machine Learning | 2002
Shaul Markovitch; Dan Rosenstein
Most classification algorithms receive as input a set of attributes of the classified objects. In many cases, however, the supplied set of attributes is not sufficient for creating an accurate, succinct and comprehensible representation of the target concept. To overcome this problem, researchers have proposed algorithms for automatic construction of features. The majority of these algorithms use a limited predefined set of operators for building new features. In this paper we propose a generalized and flexible framework that is capable of generating features from any given set of constructor functions. These can be domain-independent functions such as arithmetic and logic operators, or domain-dependent operators that rely on partial knowledge on the part of the user. The paper describes an algorithm which receives as input a set of classified objects, a set of attributes, and a specification for a set of constructor functions that contains their domains, ranges and properties. The algorithm produces as output a set of generated features that can be used by standard concept learners to create improved classifiers. The algorithm maintains a set of its best generated features and improves this set iteratively. During each iteration, the algorithm performs a beam search over its defined feature space and constructs new features by applying constructor functions to the members of its current feature set. The search is guided by general heuristic measures that are not confined to a specific feature representation. The algorithm was applied to a variety of classification problems and was able to generate features that were strongly related to the underlying target concepts. These features also significantly improved the accuracy achieved by standard concept learners, for a variety of classification problems.
international acm sigir conference on research and development in information retrieval | 2004
Dmitry Davidov; Evgeniy Gabrilovich; Shaul Markovitch
Although text categorization is a burgeoning area of IR research, readily available test collections in this field are surprisingly scarce. We describe a methodology and system (named ACCIO) for automatically acquiring labeled datasets for text categorization from the World Wide Web, by capitalizing on the body of knowledge encoded in the structure of existing hierarchical directories such as the Open Directory. We define parameters of categories that make it possible to acquire numerous datasets with desired properties, which in turn allow better control over categorization experiments. In particular, we develop metrics that estimate the difficulty of a dataset by examining the host directory structure. These metrics are shown to be good predictors of categorization accuracy that can be achieved on a dataset, and serve as efficient heuristics for generating datasets subject to users requirements. A large collection of automatically generated datasets are made available for other researchers to use.