Goran Glavaš | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Goran Glavaš is active.

Explore More

Publication

Featured researches published by Goran Glavaš.

Expert Systems With Applications | 2014

Event graphs for information retrieval and multi-document summarization

Goran Glavaš; Jan Šnajder

With the number of documents describing real-world events and event-oriented information needs rapidly growing on a daily basis, the need for efficient retrieval and concise presentation of event-related information is becoming apparent. Nonetheless, the majority of information retrieval and text summarization methods rely on shallow document representations that do not account for the semantics of events. In this article, we present event graphs, a novel event-based document representation model that filters and structures the information about events described in text. To construct the event graphs, we combine machine learning and rule-based models to extract sentence-level event mentions and determine the temporal relations between them. Building on event graphs, we present novel models for information retrieval and multi-document summarization. The information retrieval model measures the similarity between queries and documents by computing graph kernels over event graphs. The extractive multi-document summarization model selects sentences based on the relevance of the individual event mentions and the temporal structure of events. Experimental evaluation shows that our retrieval model significantly outperforms well-established retrieval models on event-oriented test collections, while the summarization model outperforms competitive models from shared multi-document summarization tasks.

international conference on computational linguistics | 2013

Exploring coreference uncertainty of generically extracted event mentions

Goran Glavaš; Jan Šnajder

Because event mentions in text may be referentially ambiguous, event coreferentiality often involves uncertainty. In this paper we consider event coreference uncertainty and explore how it is affected by the context. We develop a supervised event coreference resolution model based on the comparison of generically extracted event mentions. We analyse event coreference uncertainty in both human annotations and predictions of the model, and in both within-document and cross-document setting. We frame event coreference as a classification task when full context is available and no uncertainty is involved, and a regression task in a limited context setting that involves uncertainty. We show how a rich set of features based on argument comparison can be utilized in both settings. Experimental results on English data suggest that our approach is especially suitable for resolving cross-document event coreference. Results also suggest that modelling human coreference uncertainty in the case of limited context is feasible.

international joint conference on natural language processing | 2015

Simplifying Lexical Simplification: Do We Need Simplified Corpora?

Goran Glavaš; Sanja Štajner

Simplification of lexically complex texts, by replacing complex words with their simpler synonyms, helps non-native speakers, children, and language-impaired people understand text better. Recent lexical simplification methods rely on manually simplified corpora, which are expensive and time-consuming to build. We present an unsupervised approach to lexical simplification that makes use of the most recent word vector representations and requires only regular corpora. Results of both automated and human evaluation show that our simple method is as effective as systems that rely on simplified corpora.

Natural Language Engineering | 2015

Construction and Evaluation of Event Graphs

Goran Glavaš; Jan Šnajder

Events play an important role in natural language processing and information retrieval due to numerous event-oriented texts and information needs. Many natural language processing and information retrieval applications could benefit from a structured event-oriented document representation. In this paper, we propose event graphs as a novel way of structuring event-based information from text. Nodes in event graphs represent the individual mentions of events, whereas edges represent the temporal and coreference relations between mentions. Contrary to previous natural language processing research, which has mainly focused on individual event extraction tasks, we describe a complete end- to-end system for event graph extraction from text. Our system is a three-stage pipeline that performs anchor extraction, argument extraction, and relation extraction (temporal relation extraction and event coreference resolution), each at a performance level comparable with the state of the art. We present EvExtra, a large newspaper corpus annotated with event mentions and event graphs, on which we train and evaluate our models. To measure the overall quality of the constructed event graphs, we propose two metrics based on the tensor product between automatically and manually constructed graphs. Finally, we evaluate the overall quality of event graphs with the proposed evaluation metrics and perform a headroom analysis of the system.

joint conference on lexical and computational semantics | 2016

Unsupervised Text Segmentation Using Semantic Relatedness Graphs

Goran Glavaš; Federico Nanni; Simone Paolo Ponzetto

Segmenting text into semantically coherent fragments improves readability of text and facilitates tasks like text summarization and passage retrieval. In this paper, we present a novel unsupervised algorithm for linear text segmentation (TS) that exploits word embeddings and a measure of semantic relatedness of short texts to construct a semantic relatedness graph of the document. Semantically coherent segments are then derived from maximal cliques of the relatedness graph. The algorithm performs competitively on a standard synthetic dataset and outperforms the best-performing method on a real-world (i.e., non-artificial) dataset of political manifestos.

text speech and dialogue | 2012

Semi-supervised Acquisition of Croatian Sentiment Lexicon

Goran Glavaš; Jan Šnajder; Bojana Dalbelo Bašić

Sentiment analysis aims to recognize subjectivity expressed in natural language texts. Subjectivity analysis tries to answer if the text unit is subjective or objective, while polarity analysis determines whether a subjective text is positive or negative. Sentiment of sentences and documents is often determined using some sort of a sentiment lexicon. In this paper we present three different semi-supervised methods for automated acquisition of a sentiment lexicon that do not depend on pre-existing language resources: latent semantic analysis, graph-based propagation, and topic modelling. Methods are language independent and corpus-based, hence especially suitable for languages for which resources are very scarce. We use the presented methods to acquire sentiment lexicon for Croatian language. The performance of the methods was evaluated on the task of determining both subjectivity and polarity at (subjectivity + polarity task) and the task of determining polarity of subjective words (polarity only task). The results indicate that the methods are especially suitable for the polarity only task.

Proceedings of the 6th International Workshop on Mining Scientific Publications | 2017

Investigating Convolutional Networks and Domain-Specific Embeddings for Semantic Classification of Citations

Anne Lauscher; Goran Glavaš; Simone Paolo Ponzetto; Kai Eckert

Citation graphs and indices underpin most bibliometric analyses. However, measures derived from citation graphs do not provide insights into qualitative aspects of scientific publications. In this work, we aim to semantically characterize citations in terms of polarity and purpose. We frame polarity and purpose detection as classification tasks and investigate the performance of convolutional networks with general and domain-specific word embeddings on these tasks. Our best performing model outperforms previously reported results on a benchmark dataset by a wide margin.

north american chapter of the association for computational linguistics | 2015

TKLBLIIR: Detecting Twitter Paraphrases with TweetingJay

Mladen Karan; Goran Glavaš; Jan Šnajder; Bojana Dalbelo Bašić; Ivan Vulić; Marie-Francine Moens

When tweeting on a topic, Twitter users often post messages that convey the same or similar meaning. We describe TweetingJay, a system for detecting paraphrases and semantic similarity of tweets, with which we participated in Task 1 of SemEval 2015. TweetingJay uses a supervised model that combines semantic overlap and word alignment features, previously shown to be effective for detecting semantic textual similarity. TweetingJay reaches 65.9% F1-score and ranked fourth among the 18 participating systems. We additionally provide an analysis of the dataset and point to some peculiarities of the evaluation setup.

applications of natural language to data bases | 2012

From requirements to code: syntax-based requirements analysis for data-driven application development

Goran Glavaš; Krešimir Fertalj; Jan Šnajder

Requirements analysis phase of information system development is still predominantly human activity. Software requirements are commonly written in natural language, at least during the early stages of the development process. In this paper we present a simple method for automated analysis of requirements specifications for data-driven applications. Our approach is rule-based and uses dependency syntax parsing for the extraction of domain entities, attributes, and relationships. The results obtained from several test cases show that hand-crafted rules applied on the dependency parse of the requirements sentences might offer a feasible approach for the task. Finally, we discuss applicability and limitations of the presented approach.

Knowledge Based Systems | 2017

A resource-light method for cross-lingual semantic textual similarity

Goran Glavaš; Marc Franco-Salvador; Simone Paolo Ponzetto; Paolo Rosso

Recognizing semantically similar sentences or paragraphs across languages is beneficial for many tasks, ranging from cross-lingual information retrieval and plagiarism detection to machine translation. Recently proposed methods for predicting cross-lingual semantic similarity of short texts, however, make use of tools and resources (e.g., machine translation systems, syntactic parsers or named entity recognition) that for many languages (or language pairs) do not exist. In contrast, we propose an unsupervised and a very resource-light approach for measuring semantic similarity between texts in different languages. To operate in the bilingual (or multilingual) space, we project continuous word vectors (i.e., word embeddings) from one language to the vector space of the other language via the linear translation model. We then align words according to the similarity of their vectors in the bilingual embedding space and investigate different unsupervised measures of semantic similarity exploiting bilingual embeddings and word alignments. Requiring only a limited-size set of word translation pairs between the languages, the proposed approach is applicable to virtually any pair of languages for which there exists a sufficiently large corpus, required to learn monolingual word embeddings. Experimental results on three different datasets for measuring semantic textual similarity show that our simple resource-light approach reaches performance close to that of supervised and resource intensive methods, displaying stability across different language pairs. Furthermore, we evaluate the proposed method on two extrinsic tasks, namely extraction of parallel sentences from comparable corpora and cross lingual plagiarism detection, and show that it yields performance comparable to those of complex resource-intensive state-of-the-art models for the respective tasks.

Explore More