Octavian Popescu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Octavian Popescu is active.

Explore More

Publication

Featured researches published by Octavian Popescu.

north american chapter of the association for computational linguistics | 2015

SemEval 2015, Task 7: Diachronic Text Evaluation

Octavian Popescu; Carlo Strapparava

In this paper we describe a novel task, namely the Diachronic Text Evaluation task. A corpus of snippets which contain relevant information for the time when the text was created is extracted from a large collection of newspapers published between 1700 and 2010. The task, subdivided in three subtasks, requires the automatic system to identify the time interval when the piece of news was written. The subtasks concern specific type of information that might be available in news. The intervals come in three grades: fine, medium and coarse according to their length. The systems participating in the tasks have proved that this a doable task with very interesting possible continuations.

north american chapter of the association for computational linguistics | 2015

FBK-HLT: A New Framework for Semantic Textual Similarity

Ngoc Phuoc An Vo; Simone Magnolini; Octavian Popescu

This paper reports the description and performance of our system, FBK-HLT, participating in the SemEval 2015, Task #2 “Semantic Textual Similarity”, English subtask. We submitted three runs with different hypothesis in combining typical features (lexical similarity, string similarity, word n-grams, etc) with syntactic structure features, resulting in different sets of features. The results evaluated on both STS 2014 and 2015 datasets prove our hypothesis of building a STS system taking into consideration of syntactic information. We outperform the best system on STS 2014 datasets and achieve a very competitive result to the best system on STS 2015 datasets.

north american chapter of the association for computational linguistics | 2015

SemEval-2015 Task 15: A CPA dictionary-entry-building task

Vít Baisa; Jane Bradbury; Silvie Cinková; Ismaïl El Maarouf; Adam Kilgarriff; Octavian Popescu

This paper describes the first SemEval task to explore the use of Natural Language Processing systems for building dictionary entries, in the framework of Corpus Pattern Analysis. CPA is a corpus-driven technique which provides tools and resources to identify and represent unambiguously the main semantic patterns in which words are used. Task 15 draws on the Pattern Dictionary of English Verbs (www.pdev.org.uk), for the targeted lexical entries, and on the British National Corpus for the input text. Dictionary entry building is split into three subtasks which all start from the same concordance sample: 1) CPA parsing, where arguments and their syntactic and semantic categories have to be identified, 2) CPA clustering, in which sentences with similar patterns have to be clustered and 3) CPA automatic lexicography where the structure of patterns have to be constructed automatically. Subtask 1 attracted 3 teams, though none could beat the baseline (rule-based system). Subtask 2 attracted 2 teams, one of which beat the baseline (majority-class classifier). Subtask 3 did not attract any participant. The task has produced a major semantic multidataset resource which includes data for 121 verbs and about 17,000 annotated sentences, and which is freely accessible.

AI*IA 2016 Proceedings of the XV International Conference of the Italian Association for Artificial Intelligence on Advances in Artificial Intelligence - Volume 10037 | 2016

Analysis of the Impact of Machine Translation Evaluation Metrics for Semantic Textual Similarity

Simone Magnolini; Ngoc Phuoc An Vo; Octavian Popescu

We present a work to evaluate the hypothesis that automatic evaluation metrics developed for Machine Translation MT systems have significant impact on predicting semantic similarity scores in Semantic Textual Similarity STS task, in light of their usage for paraphrase identification. We show that different metrics may have different behaviors and significance along the semantic scale [0---5] of the STS task. In addition, we compare several classification algorithms using a combination of different MT metrics to build an STS system; consequently, we show that although this approach obtains remarkable result in paraphrase identification task, it is insufficient to achieve the same result in STS. We show that this problem is due to an excessive adaptation of some algorithms to dataset domain and at the end a way to mitigate or avoid this issue.

international joint conference on knowledge discovery knowledge engineering and knowledge management | 2016

A Multi-Layer System for Semantic Textual Similarity

Ngoc Phuoc An Vo; Octavian Popescu

Building a system able to cope with various phenomena which falls under the umbrella of semantic similarity is far from trivial. It is almost always the case that the performances of a system do not vary consistently or predictably from corpora to corpora. We analyzed the source of this variance and found that it is related to the word-pair similarity distribution among the topics in the various corpora. Then we used this insight to construct a 4-module system that would take into consideration not only string and semantic word similarity, but also word alignment and sentence structure. The system consistently achieves an accuracy which is very close to the state of the art, or reaching a new state of the art. The system is based on a multi-layer architecture and is able to deal with heterogeneous corpora which may not have been generated by the same distribution.

empirical methods in natural language processing | 2014

Fast and Accurate Misspelling Correction in Large Corpora

Octavian Popescu; Ngoc Phuoc An Vo

There are several NLP systems whose accuracy depends crucially on finding misspellings fast. However, the classical approach is based on a quadratic time algorithm with 80% coverage. We present a novel algorithm for misspelling detection, which runs in constant time and improves the coverage to more than 96%. We use this algorithm together with a cross document coreference system in order to find proper name misspellings. The experiments confirmed significant improvement over the state of the art.

north american chapter of the association for computational linguistics | 2015