Ivan Habernal
University of West Bohemia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ivan Habernal.
Information Processing and Management | 2014
Ivan Habernal; Tomáš Ptáček; Josef Steinberger
This article describes in-depth research on machine learning methods for sentiment analysis of Czech social media. Whereas in English, Chinese, or Spanish this field has a long history and evaluation datasets for various domains are widely available, in the case of the Czech language no systematic research has yet been conducted. We tackle this issue and establish a common ground for further research by providing a large human-annotated Czech social media corpus. Furthermore, we evaluate state-of-the-art supervised machine learning methods for sentiment analysis. We explore different pre-processing techniques and employ various features and classifiers. We also experiment with five different feature selection algorithms and investigate the influence of named entity recognition and preprocessing on sentiment classification performance. Moreover, in addition to our newly created social media dataset, we also report results for other popular domains, such as movie and product reviews. We believe that this article will not only extend the current sentiment analysis research to another family of languages, but will also encourage competition, potentially leading to the production of high-end commercial solutions.
empirical methods in natural language processing | 2015
Ivan Habernal; Iryna Gurevych
Analyzing arguments in user-generated Web discourse has recently gained attention in argumentation mining, an evolving field of NLP. Current approaches, which employ fully-supervised machine learning, are usually domain dependent and suffer from the lack of large and diverse annotated corpora. However, annotating arguments in discourse is costly, error-prone, and highly context-dependent. We asked whether leveraging unlabeled data in a semi-supervised manner can boost the performance of argument component identification and to which extent is the approach independent of domain and register. We propose novel features that exploit clustering of unlabeled data from debate portals based on a word embeddings representation. Using these features, we significantly outperform several baselines in the cross-validation, cross-domain, and cross-register evaluation scenarios.
Computational Linguistics | 2017
Ivan Habernal; Iryna Gurevych
The goal of argumentation mining, an evolving research field in computational linguistics, is to design methods capable of analyzing peoples argumentation. In this article, we go beyond the state of the art in several ways. (i) We deal with actual Web data and take up the challenges given by the variety of registers, multiple domains, and unrestricted noisy user-generated Web discourse. (ii) We bridge the gap between normative argumentation theories and argumentation phenomena encountered in actual data by adapting an argumentation model tested in an extensive annotation study. (iii) We create a new gold standard corpus (90k tokens in 340 documents) and experiment with several machine learning methods to identify argument components. We offer the data, source codes, and annotation guidelines to the community under free licenses. Our findings show that argumentation mining in user-generated Web discourse is a feasible but challenging task.
meeting of the association for computational linguistics | 2016
Ivan Habernal; Iryna Gurevych
We propose a new task in the field of computational argumentation in which we investigate qualitative properties of Web arguments, namely their convincingness. We cast the problem as relation classification, where a pair of arguments having the same stance to the same prompt is judged. We annotate a large datasets of 16k pairs of arguments over 32 topics and investigate whether the relation “A is more convincing than B” exhibits properties of total ordering; these findings are used as global constraints for cleaning the crowdsourced data. We propose two tasks: (1) predicting which argument from an argument pair is more convincing and (2) ranking all arguments to the topic based on their convincingness. We experiment with feature-rich SVM and bidirectional LSTM and obtain 0.76-0.78 accuracy and 0.35-0.40 Spearman’s correlation in a cross-topic evaluation. We release the newly created corpus UKPConvArg1 and the experimental software under open licenses.
Expert Systems With Applications | 2013
Ivan Habernal; Miloslav Konopík
As modern search engines are approaching the ability to deal with queries expressed in natural language, full support of natural language interfaces seems to be the next step in the development of future systems. The vision is that of users being able to tell a computer what they would like to find, using any number of sentences and as many details as requested. In this article we describe our effort to move towards this future using currently available technology. The Semantic Web framework was chosen as the best means to achieve this goal. We present our approach to building a complete Semantic Web Search Using Natural Language (SWSNL) system. We cover the complete process which includes preprocessing, semantic analysis, semantic interpretation, and executing a SPARQL query to retrieve the results. We perform an end-to-end evaluation on a domain dealing with accommodation options. The domain data come from an existing accommodation portal and we use a corpus of queries obtained by a Facebook campaign. In our paper we work with written texts in the Czech language. In addition to that, the Natural Language Understanding (NLU) module is evaluated on another domain (public transportation) and language (English). We expect that our findings will be valuable for the research community as they are strongly related to issues found in real-world scenarios. We struggled with inconsistencies in the actual Web data, with the performance of the Semantic Web engines on a decently sized knowledge base, and others.
text speech and dialogue | 2008
Ivan Habernal; Miloslav Konopík
We propose a new method for semantic analysis. The method is based on handwritten context-free grammars enriched with semantic tags. Associating the rules of a context-free grammar with semantic tags is beneficial; however, after parsing the tags are spread across the parse tree and it is usually hard to extract the complete semantic information from it. Thus, we developed an easy-to-use and yet very powerful mechanism for tag propagation. The mechanism allows the semantic information to be easily extracted from the parse tree. The propagation mechanism is based on an idea to add propagation instruction to the semantic tags. The tags with such instructions are called active tagsin this article. Using the proposed method we developed a useful tool for semantic parsing that we offer for free on our internet pages.
text speech and dialogue | 2013
Ivan Habernal; Tomáš Brychcín
This article presents a new semi-supervised method for document-level sentiment analysis. We employ a supervised state-of-the-art classification approach and enrich the feature set by adding word cluster features. These features exploit clusters of words represented in semantic spaces computed on unlabeled data. We test our method on three large sentiment datasets (Czech movie and product reviews, and English movie reviews) and outperform the current state of the art. To the best of our knowledge, this article reports the first successful incorporation of semantic spaces based on local word co-occurrence in the sentiment analysis task.
Information Processing and Management | 2015
Ivan Habernal; Tomáš Ptáček; Josef Steinberger
We explore state-of-the-art supervised machine learning methods for sentiment analysis of Czech social media.We provide a large human-annotated Czech social media corpus.We explore different pre-processing techniques and employ various features and classifiers.We experiment with five different feature selection algorithms.Results are also reported on other widely popular domains, such as movie and product reviews. This article describes in-depth research on machine learning methods for sentiment analysis of Czech social media. Whereas in English, Chinese, or Spanish this field has a long history and evaluation datasets for various domains are widely available, in the case of the Czech language no systematic research has yet been conducted. We tackle this issue and establish a common ground for further research by providing a large human-annotated Czech social media corpus. Furthermore, we evaluate state-of-the-art supervised machine learning methods for sentiment analysis. We explore different pre-processing techniques and employ various features and classifiers. We also experiment with five different feature selection algorithms and investigate the influence of named entity recognition and preprocessing on sentiment classification performance. Moreover, in addition to our newly created social media dataset, we also report results for other popular domains, such as movie and product reviews. We believe that this article will not only extend the current sentiment analysis research to another family of languages, but will also encourage competition, potentially leading to the production of high-end commercial solutions.
text speech and dialogue | 2009
Miloslav Konopík; Ivan Habernal
This article is focused on the problem of meaning recognition in written utterances. The goal is to find a computer algorithm capable to construct the meaning description of a given utterance. An original system for meaning recognition is described in this paper. The key idea of the system is the hybrid combination of expert and machine-learning approaches to meaning recognition. The system utilizes a novel algorithm for semantic parsing. The algorithm is based upon extended context-free grammars. The grammars are automatically inferred from the data.
text speech and dialogue | 2009
Ivan Habernal; Miloslav Konopík
In this paper, a methodology of semantic annotation of the LingvoSemantic corpus is presented. Semantic annotation is usually a time consuming and expensive process. We thus developed a methodology that significantly reduces the demands of the process. The methodology consists of a set of techniques and computer tools designed to simplify the process as much as possible. We claim that in this way it is possible to obtain sufficient amount of annotated data in a reasonable time frame. The LingvoSemantic project focuses on semantic analysis of user questions to an Internet information retrieval system. The semantic representation approach is based on abstract semantic annotation methodology. However, we advanced the annotation process. The bootstrapping method was used during the corpus annotation. The resulting annotated corpus consists of 20292 annotated sentences. In comparison to the straight-forward style of annotation, our approach significantly improved the efficiency of the annotation. The results, as well as a set of recommendations for creating the annotated data, are presented at the end of the paper.