Josef Steinberger | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Josef Steinberger is active.

Explore More

Publication

Featured researches published by Josef Steinberger.

decision support systems | 2011

Creating Sentiment Dictionaries via Triangulation

Josef Steinberger; Polina Lenkova; Mohamed Ebrahim; Maud Ehrman; Ali Hürriyetoğlu; Mijail A. Kabadjov; Ralf Steinberger; Hristo Tanev; Vanni Zavarella; Silvia Vázquez

The paper presents a semi-automatic approach to creating sentiment dictionaries in many languages. We first produced high-level gold-standard sentiment dictionaries for two languages and then translated them automatically into third languages. Those words that can be found in both target language word lists are likely to be useful because their word senses are likely to be similar to that of the two source languages. These dictionaries can be further corrected, extended and improved. In this paper, we present results that verify our triangulation hypothesis, by evaluating triangulated lists and comparing them to non-triangulated machine-translated word lists.

Lecture Notes in Computer Science | 2004

Text summarization and singular value decomposition

Josef Steinberger; Karel Ježek

In this paper we present the usage of singular value decomposition (SVD) in text summarization. Firstly, we mention the taxonomy of generic text summarization methods. Then we describe principles of the SVD and its possibilities to identify semantically important parts of a text. We propose a modification of the SVD-based summarization, which improves the quality of generated extracts. In the second part we propose two new evaluation methods based on SVD, which measure content similarity between an original document and its summary. In evaluation part, our summarization approach is compared with 5 other available summarizers. For evaluation of a summary quality we used, apart from a classical content-based evaluator, both newly developed SVD-based evaluators. Finally, we study the influence of the summary length on its quality from the angle of the three evaluation methods mentioned.

Information Processing and Management | 2014

Supervised sentiment analysis in Czech social media

Ivan Habernal; Tomáš Ptáček; Josef Steinberger

This article describes in-depth research on machine learning methods for sentiment analysis of Czech social media. Whereas in English, Chinese, or Spanish this field has a long history and evaluation datasets for various domains are widely available, in the case of the Czech language no systematic research has yet been conducted. We tackle this issue and establish a common ground for further research by providing a large human-annotated Czech social media corpus. Furthermore, we evaluate state-of-the-art supervised machine learning methods for sentiment analysis. We explore different pre-processing techniques and employ various features and classifiers. We also experiment with five different feature selection algorithms and investigate the influence of named entity recognition and preprocessing on sentiment classification performance. Moreover, in addition to our newly created social media dataset, we also report results for other popular domains, such as movie and product reviews. We believe that this article will not only extend the current sentiment analysis research to another family of languages, but will also encourage competition, potentially leading to the production of high-end commercial solutions.

empirical methods in natural language processing | 2005

Improving LSA-based Summarization with Anaphora Resolution

Josef Steinberger; Mijail A. Kabadjov; Massimo Poesio; Olivia Sanchez-Graillet

We propose an approach to summarization exploiting both lexical information and the output of an automatic anaphoric resolver, and using Singular Value Decomposition (SVD) to identify the main terms. We demonstrate that adding anaphoric information results in significant performance improvements over a previously developed system, in which only lexical terms are used as the input to SVD. However, we also show that how anaphoric information is used is crucial: whereas using this information to add new terms does result in improved performance, simple substitution makes the performance worse.

document engineering | 2009

Update summarization based on novel topic distribution

Josef Steinberger; Karel Ježek

This paper deals with our recent research in text summarization. The field has moved from multi-document summarization to update summarization. When producing an update summary of a set of topic-related documents the summarizer assumes prior knowledge of the reader determined by a set of older documents of the same topic. The update summarizer thus must solve a novelty vs. redundancy problem. We describe the development of our summarizer which is based on Iterative Residual Rescaling (IRR) that creates the latent semantic space of a set of documents under consideration. IRR generalizes Singular Value Decomposition (SVD) and enables to control the influence of major and minor topics in the latent space. Our sentence-extractive summarization method computes the redundancy, novelty and significance of each topic. These values are finally used in the sentence selection process. The sentence selection component prevents inner summary redundancy. The results of our participation in TAC evaluation seem to be promising.

cross language evaluation forum | 2010

Using parallel corpora for multilingual (multi-document) summarisation evaluation

Marco Turchi; Josef Steinberger; Mijail A. Kabadjov; Ralf Steinberger

We are presenting a method for the evaluation of multilingual multi-document summarisation that allows saving precious annotation time and that makes the evaluation results across languages directly comparable. The approach is based on the manual selection of the most important sentences in a cluster of documents from a sentence-aligned parallel corpus, and by projecting the sentence selection to various target languages. We also present two ways of exploiting inter-annotator agreement levels, apply them both to a baseline sentence extraction summariser in seven languages, and discuss the result differences between the two evaluation versions, as well as a preliminary analysis between languages. The same method can in principle be used to evaluate single-document summarisers or information extraction tools.

annual meeting of the special interest group on discourse and dialogue | 2015

MultiLing 2015: Multilingual Summarization of Single and Multi-Documents, On-line Fora, and Call-center Conversations

George Giannakopoulos; Jeff Kubina; John M. Conroy; Josef Steinberger; Benoit Favre; Mijail A. Kabadjov; Udo Kruschwitz; Massimo Poesio

In this paper we present an overview of MultiLing 2015, a special session at SIGdial 2015. MultiLing is a communitydriven initiative that pushes the state-ofthe-art in Automatic Summarization by providing data sets and fostering further research and development of summarization systems. There were in total 23 participants this year submitting their system outputs to one or more of the four tasks of MultiLing: MSS, MMS, OnForumS and CCCS. We provide a brief overview of each task and its participation and evaluation.

text speech and dialogue | 2009

Update Summarization Based on Latent Semantic Analysis

Josef Steinberger; Karel Ježek

This paper deals with our recent research in text summarization. We went from single-document summarization through multi-document summarization to update summarization. We describe the development of our summarizer which is based on latent semantic analysis (LSA) and propose the update summarization component which determines the redundancy and novelty of each topic discovered by LSA. The final part of this paper presents the results of our participation in the experiment of Text Analysis Conference 2008.

international conference on computational linguistics | 2014

UWB: Machine Learning Approach to Aspect-Based Sentiment Analysis

Tomáš Brychcín; Michal Konkol; Josef Steinberger

This paper describes our system participating in the aspect-based sentiment analysis task of Semeval 2014. The goal was to identify the aspects of given target entities and the sentiment expressed towards each aspect. We firstly introduce a system based on supervised machine learning, which is strictly constrained and uses the training data as the only source of information. This system is then extended by unsupervised methods for latent semantics discovery (LDA and semantic spaces) as well as the approach based on sentiment vocabularies. The evaluation was done on two domains, restaurants and laptops. We show that our approach leads to very promising results.

meeting of the association for computational linguistics | 2014

Aspect-Level Sentiment Analysis in Czech

Josef Steinberger; Tomáš Brychcín; Michal Konkol

This paper presents a pioneering research on aspect-level sentiment analysis in Czech. The main contribution of the paper is the newly created Czech aspectlevel sentiment corpus, based on data from restaurant reviews. We annotated the corpus with two variants of aspect-level sentiment ‐ aspect terms and aspect categories. The corpus consists of 1,244 sentences and 1,824 annotated aspects and is freely available to the research community. Furthermore, we propose a baseline system based on supervised machine learning. Our system detects the aspect terms with Fmeasure 68.65% and their polarities with accuracy 66.27%. The categories are recognized with F-measure 74.02% and their polarities with accuracy 66.61%.

Explore More