Einat Minkov
University of Haifa
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Einat Minkov.
international acm sigir conference on research and development in information retrieval | 2006
Einat Minkov; William W. Cohen; Andrew Y. Ng
Similarity measures for text have historically been an important tool for solving information retrieval problems. In many interesting settings, however, documents are often closely connected to other documents, as well as other non-textual objects: for instance, email messages are connected to other messages via header information. In this paper we consider extended similarity metrics for documents and other objects embedded in graphs, facilitated via a lazy graph walk. We provide a detailed instantiation of this framework for email data, where content, social networks and a timeline are integrated in a structural graph. The suggested framework is evaluated for two email-related problems: disambiguating names in email documents, and threading. We show that reranking schemes based on the graph-walk similarity measures often outperform baseline methods, and that further improvements can be obtained by use of appropriate learning methods.
knowledge discovery and data mining | 2007
Einat Minkov; William W. Cohen
We consider the setting of lazy random graph walks over directed graphs, where entities are represented as nodes and typed edges represent the relations between them. This framework has been used in a variety of problems to derive an extended measure of entity similarity. In this paper we contrast two different approaches for applying supervised learning in this framework to improve graph walk performance: a gradient descent algorithm that tunes the transition probabilities of the graph, and a reranking approach that uses features describing global properties of the traversed paths. An empirical evaluation on a set of tasks from the domain of personal information management and multiple corpora show that reranking performance is usually superior to the local gradient descent algorithm, and that the methods often yield best results when combined.
ACM Transactions on Information Systems | 2010
Einat Minkov; William W. Cohen
Relational or semistructured data is naturally represented by a graph, where nodes denote entities and directed typed edges represent the relations between them. Such graphs are heterogeneous, describing different types of objects and links. We represent personal information as a graph that includes messages, terms, persons, dates, and other object types, and relations like sent-to and has-term. Given the graph, we apply finite random graph walks to induce a measure of entity similarity, which can be viewed as a tool for performing search in the graph. Experiments conducted using personal email collections derived from the Enron corpus and other corpora show how the different tasks of alias finding, threading, and person name disambiguation can be all addressed as search queries in this framework, where the graph-walk-based similarity metric is preferable to alternative approaches, and further improvements are achieved with learning. While researchers have suggested to tune edge weight parameters to optimize the graph walk performance per task, we apply reranking to improve the graph walk results, using features that describe high-level information such as the paths traversed in the walk. High performance, together with practical runtimes, suggest that the described framework is a useful search system in the PIM domain, as well as in other semistructured domains.
empirical methods in natural language processing | 2008
Einat Minkov; William W. Cohen
We consider a parsed text corpus as an instance of a labelled directed graph, where nodes represent words and weighted directed edges represent the syntactic relations between them. We show that graph walks, combined with existing techniques of supervised learning, can be used to derive a task-specific word similarity measure in this graph. We also propose a new path-constrained graph walk method, in which the graph walk process is guided by high-level knowledge about meaningful edge sequences (paths). Empirical evaluation on the task of named entity coordinate term extraction shows that this framework is preferable to vector-based models for small-sized corpora. It is also shown that the path-constrained graph walk algorithm yields both performance and scalability gains.
web search and data mining | 2015
Bhavana Dalvi; Einat Minkov; Partha Pratim Talukdar; William W. Cohen
While there has been much research on automatically constructing structured Knowledge Bases (KBs), most of it has focused on generating facts to populate a KB. However, a useful KB must go beyond facts. For example, glosses (short natural language definitions) have been found to be very useful in tasks such as Word Sense Disambiguation. However, the important problem of Automatic Gloss Finding, i.e., assigning glosses to entities in an initially gloss-free KB, is relatively unexplored. We address that gap in this paper. In particular, we propose GLOFIN, a hierarchical semi-supervised learning algorithm for this problem which makes effective use of limited amounts of supervision and available ontological constraints. To the best of our knowledge, GLOFIN is the first system for this task. Through extensive experiments on real-world datasets, we demonstrate GLOFINs effectiveness. It is encouraging to see that GLOFIN outperforms other state-of-the-art SSL algorithms, especially in low supervision settings. We also demonstrate GLOFINs robustness to noise through experiments on a wide variety of KBs, ranging from user contributed (e.g., Freebase) to automatically constructed (e.g., NELL). To facilitate further research in this area, we have made the datasets and code used in this paper publicly available.
Natural Language Engineering | 2014
Einat Minkov; William W. Cohen
We consider a dependency-parsed text corpus as an instance of a labeled directed graph, where nodes represent words and weighted directed edges represent the syntactic relations between them. We show that graph walks, combined with existing techniques of supervised learning that model local and global information about the graph walk process, can be used to derive a task-specific word similarity measure in this graph. We also propose and evaluate a new learning method in this framework, a path-constrained graph walk variant, in which the walk process is guided by high-level knowledge about meaningful edge sequences (paths) in the graph. Empirical evaluation on the tasks of named entity coordinate term extraction and general word synonym extraction show that this framework is preferable to, or competitive with, vector-based models when learning is applied, and using small to moderate size text corpora.
international joint conference on natural language processing | 2015
Ni Lao; Einat Minkov; William W. Cohen
The path ranking algorithm (PRA) has been recently proposed to address relational classification and retrieval tasks at large scale. We describe Cor-PRA, an enhanced system that can model a larger space of relational rules, including longer relational rules and a class of first order rules with constants, while maintaining scalability. We describe and test faster algorithms for searching for these features. A key contribution is to leverage backward random walks to efficiently discover these types of rules. An empirical study is conducted on the tasks of graph-based knowledge base inference, and person named entity extraction from parsed text. Our results show that learning paths with constants improves performance on both tasks, and that modeling longer paths dramatically improves performance for the named entity extraction task.
empirical methods in natural language processing | 2015
Evgenia Wasserman Pritsker; William W. Cohen; Einat Minkov
We outline a learning framework that aims at identifying useful contextual cues for knowledge-based word sense disambiguation. The usefulness of individual context words is evaluated based on diverse lexico-statistical and syntactic information, as well as simple word distance. Experiments using two different knowledge-based methods and benchmark datasets show significant improvements due to context modeling, beating the conventional window-based approach.
Social Media for Government Services | 2015
Susan Grant-Muller; Ayelet Gal-Tzur; Einat Minkov; Tsvi Kuflik; Silvio Nocera; Itay Shoor
Rapid and recent developments in social media networks are providing a vision amongst transport suppliers, governments and academia of ‘next-generation’ information channels. This chapter identifies the main requirements for a social media information harvesting methodology in the transport context and highlights the challenges involved. Three questions are addressed concerning (1) The ways in which social media data can be used alongside or potentially instead of current transport data sources, (2) The technical challenges in text mining social media that create difficulties in generating high quality data for the transport sector and finally, (3) Whether there are wider institutional barriers in harnessing the potential of social media data for the transport sector. The chapter demonstrates that information harvested from social media can complement, enrich (or even replace) traditional data collection. Whilst further research is needed to develop automatic or semi-automatic methodologies for harvesting and analysing transport-related social media information, new skills are also needed in the sector to maximise the benefits of this new information source.
advanced visual interfaces | 2014
Alan J. Wecker; Joel Lanir; Osnat Mokryn; Einat Minkov; Tsvi Kuflik
A plethora of tools exist for extracting and visualizing key sentiment information from a corpus of text documents. Often, however, there is a need for quickly assessing the sentiment and feelings that arise from an individual document. We describe an interactive tool that visualizes the sentiment of a specific document such as an online opinion, blog, or transcript, by visually highlighting the sentiment features while leaving the document text intact.