Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Michael Strube is active.

Publication


Featured researches published by Michael Strube.


language and technology conference | 2006

Exploiting Semantic Role Labeling, WordNet and Wikipedia for Coreference Resolution

Simone Paolo Ponzetto; Michael Strube

In this paper we present an extension of a machine learning based coreference resolution system which uses features induced from different semantic knowledge sources. These features represent knowledge mined from WordNet and Wikipedia, as well as information about semantic role labels. We show that semantic features indeed improve the performance on different referring expression types such as pronouns and common nouns.


Journal of Artificial Intelligence Research | 2007

Knowledge derived from wikipedia for computing semantic relatedness

Simone Paolo Ponzetto; Michael Strube

Wikipedia provides a semantic network for computing semantic relatedness in a more structured fashion than a search engine and with more coverage than WordNet. We present experiments on using Wikipedia for computing semantic relatedness and compare it to WordNet on various benchmarking datasets. Existing relatedness measures perform better using Wikipedia than a baseline given by Google counts, and we show that Wikipedia outperforms WordNet on some datasets. We also address the question whether and how Wikipedia can be integrated into NLP applications as a knowledge base. Including Wikipedia improves the performance of a machine learning based coreference resolution system, indicating that it represents a valuable resource for NLP applications. Finally, we show that our method can be easily used for languages other than English by computing semantic relatedness for a German dataset.


meeting of the association for computational linguistics | 2003

A Machine Learning Approach to Pronoun Resolution in Spoken Dialogue

Michael Strube; Christoph Müller

We apply a decision tree based approach to pronoun resolution in spoken dialogue. Our system deals with pronouns with NP-and non-NP-antecedents. We present a set of features designed for pronoun resolution in spoken dialogue and determine the most promising features. We evaluate the system on twenty Switchboard dialogues and show that it compares well to Byrons (2002) manually tuned system.


international conference on natural language generation | 2008

Dependency tree based sentence compression

Katja Filippova; Michael Strube

We present a novel unsupervised method for sentence compression which relies on a dependency tree representation and shortens sentences by removing subtrees. An automatic evaluation shows that our method obtains result comparable or superior to the state of the art. We demonstrate that the choice of the parser affects the performance of the system. We also apply the method to German and report the results of an evaluation with humans.


empirical methods in natural language processing | 2002

The Influence of Minimum Edit Distance on Reference Resolution

Michael Strube; Stefan Rapp; Christoph Müller

We report on experiments in reference resolution using a decision tree approach. We started with a standard feature set used in previous work, which led to moderate results. A closer examination of the performance of the features for different forms of anaphoric expressions showed good results for pronouns, moderate results for proper names, and poor results for definite noun phrases. We then included a cheap, language and domain independent feature based on the minimum edit distance between strings. This feature yielded a significant improvement for data sets consisting of definite noun phrases and proper names, respectively. When applied to the whole data set the feature produced a smaller but still significant improvement.


Artificial Intelligence | 2011

Taxonomy induction based on a collaboratively built knowledge repository

Simone Paolo Ponzetto; Michael Strube

The category system in Wikipedia can be taken as a conceptual network. We label the semantic relations between categories using methods based on connectivity in the network and lexico-syntactic matching. The result is a large scale taxonomy. For evaluation we propose a method which (1) manually determines the quality of our taxonomy, and (2) automatically compares its coverage with ResearchCyc, one of the largest manually created ontologies, and the lexical database WordNet. Additionally, we perform an extrinsic evaluation by computing semantic similarity between words in benchmarking datasets. The results show that the taxonomy compares favorably in quality and coverage with broad-coverage manually created resources.


empirical methods in natural language processing | 2008

Sentence Fusion via Dependency Graph Compression

Katja Filippova; Michael Strube

We present a novel unsupervised sentence fusion method which we apply to a corpus of biographies in German. Given a group of related sentences, we align their dependency trees and build a dependency graph. Using integer linear programming we compress this graph to a new tree, which we then linearize. We use GermaNet and Wikipedia for checking semantic compatibility of co-arguments. In an evaluation with human judges our method outperforms the fusion approach of Barzilay & McKeown (2005) with respect to readability.


Artificial Intelligence | 2013

Transforming Wikipedia into a large scale multilingual concept network

Vivi Nastase; Michael Strube

A knowledge base for real-world language processing applications should consist of a large base of facts and reasoning mechanisms that combine them to induce novel and more complex information. This paper describes an approach to deriving such a large scale and multilingual resource by exploiting several facets of the on-line encyclopedia Wikipedia. We show how we can build upon Wikipedia@?s existing network of categories and articles to automatically discover new relations and their instances. Working on top of this network allows for added information to influence the network and be propagated throughout it using inference mechanisms that connect different pieces of existing knowledge. We then exploit this gained information to discover new relations that refine some of those found in the previous step. The result is a network containing approximately 3.7 million concepts with lexicalizations in numerous languages and 49+ million relation instances. Intrinsic and extrinsic evaluations show that this is a high quality resource and beneficial to various NLP tasks.


european semantic web conference | 2008

Distinguishing between instances and classes in the wikipedia taxonomy

Cäcilia Zirn; Vivi Nastase; Michael Strube

This paper presents an automatic method for differentiating between instances and classes in a large scale taxonomy induced from the Wikipedia category network. The method exploits characteristics of the category names and the structure of the network. The approach we present is the first attempt to make this distinction automatically in a large scale resource. In contrast, this distinction has been made in WordNet and Cyc based on manual annotations. The result of the process is evaluated against ResearchCyc. On the subnetwork shared by our taxonomy and ResearchCyc we report 84.52% accuracy.


meeting of the association for computational linguistics | 2014

Scoring Coreference Partitions of Predicted Mentions: A Reference Implementation

Sameer Pradhan; Xiaoqiang Luo; Marta Recasens; Eduard H. Hovy; Vincent Ng; Michael Strube

The definitions of two coreference scoring metrics- B3 and CEAF-are underspecified with respect to predicted, as opposed to key (or gold) mentions. Several variations have been proposed that manipulate either, or both, the key and predicted mentions in order to get a one-to-one mapping. On the other hand, the metric BLANC was, until recently, limited to scoring partitions of key mentions. In this paper, we (i) argue that mention manipulation for scoring predicted mentions is unnecessary, and potentially harmful as it could produce unintuitive results; (ii) illustrate the application of all these measures to scoring predicted mentions; (iii) make available an open-source, thoroughly-tested reference implementation of the main coreference evaluation measures; and (iv) rescore the results of the CoNLL-2011/2012 shared task systems with this implementation. This will help the community accurately measure and compare new end-to-end coreference resolution algorithms.

Collaboration


Dive into the Michael Strube's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alex Judea

University of Stuttgart

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Udo Hahn

University of Freiburg

View shared research outputs
Top Co-Authors

Avatar

Daraksha Parveen

Heidelberg Institute for Theoretical Studies

View shared research outputs
Top Co-Authors

Avatar

Jie Cai

Heidelberg Institute for Theoretical Studies

View shared research outputs
Top Co-Authors

Avatar

Sebastian Martschat

Heidelberg Institute for Theoretical Studies

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge