Is this you? Create Your Porfile

Torsten Zesch

Technische Universität Darmstadt

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Torsten Zesch is active.

Explore More

Publication

Featured researches published by Torsten Zesch.

Natural Language Engineering | 2010

Wisdom of crowds versus wisdom of linguists – measuring the semantic relatedness of words

Torsten Zesch; Iryna Gurevych

In this article, we present a comprehensive study aimed at computing semantic relatedness of word pairs. We analyze the performance of a large number of semantic relatedness measures proposed in the literature with respect to different experimental conditions, such as (i) the datasets employed, (ii) the language (English or German), (iii) the underlying knowledge source, and (iv) the evaluation task (computing scores of semantic relatedness, ranking word pairs, solving word choice problems). To our knowledge, this study is the first to systematically analyze semantic relatedness on a large number of datasets with different properties, while emphasizing the role of the knowledge source compiled either by the ‘wisdom of linguists’ (i.e., classical wordnets) or by the ‘wisdom of crowds’ (i.e., collaboratively constructed knowledge sources like Wikipedia). The article discusses benefits and drawbacks of different approaches to evaluating semantic relatedness. We show that results should be interpreted carefully to evaluate particular aspects of semantic relatedness. For the first time, we employ a vector based measure of semantic relatedness, relying on a concept space built from documents, to the first paragraph of Wikipedia articles, to English WordNet glosses, and to GermaNet based pseudo glosses. Contrary to previous research (Strube and Ponzetto 2006; Gabrilovich and Markovitch 2007; Zesch et al. 2007), we find that ‘wisdom of crowds’ based resources are not superior to ‘wisdom of linguists’ based resources. We also find that using the first paragraph of a Wikipedia article as opposed to the whole article leads to better precision, but decreases recall. Finally, we present two systems that were developed to aid the experiments presented herein and are freely available 1 for research purposes: (i) DEXTRACT, a software to semi-automatically construct corpus-driven semantic relatedness datasets, and (ii) JWPL, a Java-based high-performance Wikipedia Application Programming Interface (API) for building natural language processing (NLP) applications.

Proceedings of the Workshop on Linguistic Distances | 2006

Automatically Creating Datasets for Measures of Semantic Relatedness

Torsten Zesch; Iryna Gurevych

Semantic relatedness is a special form of linguistic distance between words. Evaluating semantic relatedness measures is usually performed by comparison with human judgments. Previous test datasets had been created analytically and were limited in size. We propose a corpus-based system for automatically creating test datasets. Experiments with human subjects show that the resulting datasets cover all degrees of relatedness. As a result of the corpus-based approach, test datasets cover all types of lexical-semantic relations and contain domain-specific words naturally occurring in texts.

north american chapter of the association for computational linguistics | 2007

Comparing Wikipedia and German Wordnet by Evaluating Semantic Relatedness on Multiple Datasets

Torsten Zesch; Iryna Gurevych; Max Mühlhäuser

We evaluate semantic relatedness measures on different German datasets showing that their performance depends on: (i) the definition of relatedness that was underlying the construction of the evaluation dataset, and (ii) the knowledge source used for computing semantic relatedness. We analyze how the underlying knowledge source influences the performance of a measure. Finally, we investigate the combination of wordnets and Wikipedia to improve the performance of semantic relatedness measures.

meeting of the association for computational linguistics | 2014

DKPro TC: A Java-based Framework for Supervised Learning Experiments on Textual Data

Johannes Daxenberger; Oliver Ferschke; Iryna Gurevych; Torsten Zesch

We present DKPro TC, a framework for supervised learning experiments on textual data. The main goal of DKPro TC is to enable researchers to focus on the actual research task behind the learning problem and let the framework handle the rest. It enables rapid prototyping of experiments by relying on an easy-to-use workflow engine and standardized document preprocessing based on the Apache Unstructured Information Management Architecture (Ferrucci and Lally, 2004). It ships with standard feature extraction modules, while at the same time allowing the user to add customized extractors. The extensive reporting and logging facilities make DKPro TC experiments fully replicable.

ieee international conference semantic computing | 2011

Link Discovery: A Comprehensive Analysis

Nicolai Erbs; Torsten Zesch; Iryna Gurevych

We present a comprehensive analysis of link discovery approaches. We classify them with regard to the type of knowledge being used, and identify three commonly used sources of knowledge: The text of a document, the document title, and already existing links. We analyze the influence of the knowledge source as well as of the amount of training data used. Results show that the link-based approach performs best if the amount of training data is huge. In a more realistic setting with fewer training data, the text-based approach yields better results.

international symposium on wikis and open collaboration | 2009

An architecture to support intelligent user interfaces for Wikis by means of Natural Language Processing

Johannes Hoffart; Torsten Zesch; Iryna Gurevych

We present an architecture for integrating a set of Natural Language Processing (NLP) techniques with a wiki platform. This entails support for adding, organizing, and finding content in the wiki. We perform a comprehensive analysis of how NLP techniques can support the user interaction with the wiki, using an intelligent interface to provide suggestions. The architecture is designed to be deployed with any existing wiki platform, especially those used in corporate environments. We implemented a prototype integrating the NLP techniques keyphrase extraction and text segmentation, as well as an improved search engine. The prototype is integrated with two widely used wiki platforms: Media-Wiki and TWiki.

workshop on innovative use of nlp for building educational applications | 2014

Automatic Generation of Challenging Distractors Using Context-Sensitive Inference Rules

Torsten Zesch; Oren Melamud

Automatically generating challenging distractors for multiple-choice gap-fill items is still an unsolved problem. We propose to employ context-sensitive lexical inference rules in order to generate distractors that are semantically similar to the gap target word in some sense, but not in the particular sense induced by the gap-fill context. We hypothesize that such distractors should be particularly hard to distinguish from the correct answer. We focus on verbs as they are especially difficult to master for language learners and find that our approach is quite effective. In our test set of 20 items, our proposed method decreases the number of invalid distractors in 90% of the cases, and fully eliminates all of them in 65%. Further analysis on that dataset does not support our hypothesis regarding item difficulty as measured by average error rate of language learners. We conjecture that this may be due to limitations in our evaluation setting, which we plan to address in future work.

workshop on innovative use of nlp for building educational applications | 2015

Task-Independent Features for Automated Essay Grading

Torsten Zesch; Michael Wojatzki; Dirk Scholten-Akoun

Automated scoring of student essays is increasingly used to reduce manual grading effort. State-of-the-art approaches use supervised machine learning which makes it complicated to transfer a system trained on one task to another. We investigate which currently used features are task-independent and evaluate their transferability on English and German datasets. We find that, by using our task-independent feature set, models transfer better between tasks. We also find that the transfer works even better between tasks of the same type.

meeting of the association for computational linguistics | 2014

DKPro Keyphrases: Flexible and Reusable Keyphrase Extraction Experiments

Nicolai Erbs; Pedro Bispo Santos; Iryna Gurevych; Torsten Zesch

DKPro Keyphrases is a keyphrase extraction framework based on UIMA. It offers a wide range of state-of-the-art keyphrase experiments approaches. At the same time, it is a workbench for developing new extraction approaches and evaluating their impact. DKPro Keyphrases is publicly available under an open-source license. 1

workshop on innovative use of nlp for building educational applications | 2015

Reducing Annotation Efforts in Supervised Short Answer Scoring

Torsten Zesch; Michael Heilman; Aoife Cahill

Automated short answer scoring is increasingly used to give students timely feedback about their learning progress. Building scoring models comes with high costs, as stateof-the-art methods using supervised learning require large amounts of hand-annotated data. We analyze the potential of recently proposed methods for semi-supervised learning based on clustering. We find that all examined methods (centroids, all clusters, selected pure clusters) are mainly effective for very short answers and do not generalize well to severalsentence responses.

Explore More