Christian Scheible
University of Stuttgart
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Christian Scheible.
Proceedings of the Workshop on Geometrical Models of Natural Language Semantics | 2009
Beate Dorow; Florian Laws; Lukas Michelbacher; Christian Scheible; Jason Utt
This paper presents a graph-theoretic approach to the identification of yet-unknown word translations. The proposed algorithm is based on the recursive Sim-Rank algorithm and relies on the intuition that two words are similar if they establish similar grammatical relationships with similar other words. We also present a formulation of SimRank in matrix form and extensions for edge weights, edge labels and multiple graphs.
meeting of the association for computational linguistics | 2016
Christian Scheible; Roman Klinger; Sebastian Padó
Quotation detection is the task of locating spans of quoted speech in text. The state of the art treats this problem as a sequence labeling task and employs linear-chain conditional random fields. We question the efficacy of this choice: The Markov assumption in the model prohibits it from making joint decisions about the begin, end, and internal context of a quotation. We perform an extensive analysis with two new model architectures. We find that (a), simple boundary classification combined with a greedy prediction strategy is competitive with the state of the art; (b), a semi-Markov model significantly outperforms all others, by relaxing the Markov assumption.
applications of natural language to data bases | 2017
Hanna Kicherer; Marcel Dittrich; Lukas Grebe; Christian Scheible; Roman Klinger
Social media data is notoriously noisy and unclean. Recipe collections built by users are no exception, particularly when it comes to cataloging them. However, consistent and transparent categorization is vital to users who search for a specific entry. Similarly, curators are faced with the same challenge given a large collection of existing recipes: They first need to understand the data to be able to build a clean system of categories. This paper presents an empirical study on the automatic classification of recipes on the German cooking website Chefkoch. The central question we aim at answering is: Which information is necessary to perform well at this task? In particular, we compare features extracted from the free text instructions of the recipe to those taken from the list of ingredients. On a sample of 5,000 recipes with 87 classes, our feature analysis shows that a combination of nouns from the textual description of the recipe with ingredient features performs best (48% \(\text {F}_1\)). Nouns alone achieve 45% \(\text {F}_1\) and ingredients alone 46% \(\text {F}_1\). However, other word classes do not complement the information from nouns. On a bigger training set of 50,000 instances, the best configuration shows an improvement to 57% highlighting the importance of a sizeable data set.
data and knowledge engineering | 2018
Hanna Kicherer; Marcel Dittrich; Lukas Grebe; Christian Scheible; Roman Klinger
Abstract Social media data is notoriously noisy and unclean. Recipe collections and their manual categorization built by users are no exception. However, a consistent and transparent categorization is vital to users who search for a specific entry. Similarly, curators are faced with the same challenge given a large collection of existing recipes: They first need to understand the data to be able to build a clean system of categories. This paper presents an empirical study using machine learning classifiers (logistic regression and decision trees) for the automatic classification of recipes on the German cooking website Chefkoch.de. The central question we aim at answering is: Which information is necessary to perform well at this task? In particular, we compare features extracted from the free text instructions of the recipe to those taken from the list of ingredients. On a sample of 5000 recipes with 87 classes, our feature analysis shows that a combination of nouns from the textual description of the recipe with ingredient features performs best in the logistic regression model (48% F1). Nouns alone achieve 45% F1 and ingredients alone 46% F1. However, other word classes do not complement the information from nouns. Decision trees constantly underperform the logistic regression, however, lead to an interpretable model. On a bigger training set of 50,000 instances, the best configuration shows an improvement to 57% highlighting the importance of a sizeable data set. In addition, we report on the use of these feature vectors for similarity search and ranking of recipes and evaluate on the task of (near) duplicate detection. We show that our method can reduce the manual curation with precision@3 = 0.52.
conference of the european chapter of the association for computational linguistics | 2014
Christian Scheible; Hinrich Schütze
Sentiment relevance (SR) aims at identifying content that does not contribute to sentiment analysis. Previously, automatic SR classification has been studied in a limited scope, using a single domain and feature augmentation techniques that require large hand-crafted databases. In this paper, we present experiments on SR classification with automatically learned feature representations on multiple domains. We show that a combination of transfer learning and in-task supervision using features learned unsupervisedly by the stacked denoising autoencoder significantly outperforms a bag-of-words baseline for in-domain and cross-domain classification.
empirical methods in natural language processing | 2011
Florian Laws; Christian Scheible; Hinrich Schütze
meeting of the association for computational linguistics | 2008
Sabine Schulte im Walde; Christian Hying; Christian Scheible; Helmut Schmid
international conference on computational linguistics | 2010
Florian Laws; Lukas Michelbacher; Beate Dorow; Christian Scheible; Ulrich Heid; Hinrich Schütze
meeting of the association for computational linguistics | 2010
Christian Scheible
Proceedings of the 11th International Conference on Computational Semantics | 2015
Maximilian Köper; Christian Scheible; Sabine Schulte im Walde