Annalina Caputo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Annalina Caputo is active.

Explore More

Publication

Featured researches published by Annalina Caputo.

Information Sciences | 2016

Concept-based item representations for a cross-lingual content-based recommendation process

Fedelucio Narducci; Pierpaolo Basile; Cataldo Musto; Pasquale Lops; Annalina Caputo; Marco de Gemmis; Leo Iaquinta; Giovanni Semeraro

The growth of the Web is the most influential factor that contributes to the increasing importance of text retrieval and filtering systems. On one hand, the Web is becoming more and more multilingual, and on the other hand users themselves are becoming increasingly polyglot. In this context, platforms for intelligent information access as search engines or recommender systems need to evolve to deal with this increasing amount of multilingual information. This paper proposes a content-based recommender system able to generate cross-lingual recommendations. The idea is to exploit user preferences learned in a given language, to suggest item in another language. The main intuition behind the work is that, differently from keywords which are inherently language dependent, concepts are stable across different languages, allowing to deal with multilingual and cross-lingual scenarios. We propose four knowledge-based strategies to build concept-based representation of items, by relying on the knowledge contained in two knowledge sources, i.e. Wikipedia and BabelNet. We learn user profiles by leveraging the different concept-based representations, in order to define a cross-lingual recommendation process. The empirical evaluation carried out on two state of the art datasets, DBbook and Movielens, shows that concept-based approaches are suitable to provide cross-lingual recommendations, even though there is not a clear advantage of using one of the different proposed representations. However, it emerges that most of the times the approaches based on BabelNet outperform those based on Wikipedia, which clearly shows the advantage of using a native multilingual knowledge source.

international syposium on methodologies for intelligent systems | 2009

Boosting a Semantic Search Engine by Named Entities

Annalina Caputo; Pierpaolo Basile; Giovanni Semeraro

Traditional Information Retrieval (IR) systems are based on bag-of-words representation. This approach retrieves relevant documents by lexical matching between query and document terms. Due to synonymy and polysemy, lexical methods produce imprecise or incomplete results. In this paper we present SENSE (SEmantic N-levels Search Engine), an IR system that tries to overcome the limitations of the ranked keyword approach, by introducing semantic levels which integrate (and not simply replace) the lexical level represented by keywords. Semantic levels provide information about word meanings, as described in a reference dictionary, and named entities. This paper focuses on the named entity level. Our aim is to prove that named entities are useful to improve retrieval performance. We exploit a model able to capture entity relationships, although they are not explicit in documents text. Experiments on CLEF dataset prove the effectiveness of our hypothesis.

Information Retrieval and Mining in Distributed Environments | 2010

Integrating Sense Discrimination in a Semantic Information Retrieval System

Pierpaolo Basile; Annalina Caputo; Giovanni Semeraro

This paper proposes an Information Retrieval (IR) system that integrates sense discrimination to overcome the problem of word ambiguity. Word ambiguity is a key problem for systems that have access to textual information. Semantic Vectors are able to divide the usages of a word into different meanings, by discriminating among word meanings on the ground of information available in unannotated corpora. This paper has a twofold goal: the former is to evaluate the effectiveness of an IR system based on Semantic Vectors, the latter is to describe how they have been integrated in a semantic IR framework to build semantic spaces of words and documents. To achieve the first goal, we performed an in vivo evaluation in an IR scenario and we compared the method based on sense discrimination to a method based on Word Sense Disambiguation (WSD). Contrarily to sense discrimination, which aims to discriminate among different meanings not necessarily known a priori, WSD is the task of selecting a sense for a word from a set of predefined possibilities. To accomplish the second goal, we integrated Semantic Vectors in a semantic search engine called SENSE (SEmantic N-levels Search Engine).

cross language evaluation forum | 2008

SENSE: semantic N-levels search engine at CLEF2008 ad hoc robust-WSD track

Annalina Caputo; Pierpaolo Basile; Giovanni Semeraro

This paper presents the results of the experiments conducted at the University of Bari for the Ad Hoc Robust-WSD track of the Cross-Language Evaluation Forum (CLEF) 2008. The evaluation was performed using SENSE (SEmantic N-levels Search Engine), a semantic search engine that tries to overcome the limitations of the ranked keyword approach by introducing semantic levels, which integrate (and not simply replace) the lexical level represented by keywords. We show how SENSE is able to manage documents indexed at two separate levels, keyword and word meaning, in an attempt of improving the retrieval performance. Two types of experiments have been performed by exploiting both only one indexing level and all indexing levels at the same time. The experiments performed combining keywords and word meanings, extracted from the WordNet lexical database, show the promise of the idea and point out the value of our institution. In particular the results confirm our hypothesis: The combination of two indexing levels outperforms a single level. Indeed, an improvement of 35% in precision has been obtained by adopting the N-levels model with respect to the results obtained by exploiting the indexing level based only on keywords.

DART@AI*IA | 2017

SABRE: A Sentiment Aspect-Based Retrieval Engine

Annalina Caputo; Pierpaolo Basile; Marco de Gemmis; Pasquale Lops; Giovanni Semeraro; Gaetano Rossiello

The retrieval of pertaining information during the decision-making process requires more than the traditional concept of relevance to be fulfilled. This task asks for opinionated sources of information able to influence the user’s point of view about an entity or target. We propose SABRE, a Sentiment Aspect-Based Retrieval Engine, able to tackle this process through the retrieval of opinions about an entity at two different levels of granularity that we called aspect and sub-aspect. Such fine-grained opinion retrieval enables both an aspect-based sentiment classification of text fragments, and an aspect-based filtering during the navigational exploration of the retrieved documents. A preliminary evaluation on a manually created dataset shows the ability of the proposed method at better identify \(\langle \textit{aspect}, \textit{sub}\)-\(\textit{aspect}\rangle \) with respect to a term frequency baseline.

applications of natural language to data bases | 2016

Learning to Rank Entity Relatedness Through Embedding-Based Features

Pierpaolo Basile; Annalina Caputo; Gaetano Rossiello; Giovanni Semeraro

This paper describes the effect of introducing embedding-based features in a learning to rank approach to entity relatedness. We define several features that exploit word- and link-embedding approaches by relying on both links and the content that appear in Wikipedia articles. These features are combined with other state-of-the-art relatedness measures by using a learning to rank framework. In the evaluation, we report the performance of each feature individually. Moreover, we investigate the contribution of each feature to the ranking function by analysing the output of a feature selection algorithm. The results of this analysis prove that features based on word and link embeddings are able to increase the performance of the learning to rank algorithm.

international acm sigir conference on research and development in information retrieval | 2010

From fusion to re-ranking: a semantic approach

Annalina Caputo; Pierpaolo Basile; Giovanni Semeraro

A number of works have shown that the aggregation of several Information Retrieval (IR) systems works better than each system working individually. Nevertheless, early investigation in the context of CLEF Robust-WSD task, in which semantics is involved, showed that aggregation strategies achieve only slight improvements. This paper proposes a re-ranking approach which relies on inter-document similarities. The novelty of our idea is twofold: the output of a semantic based IR system is exploited to re-weigh documents and a new strategy based on Semantic Vectors is used to compute inter-document similarities.

north american chapter of the association for computational linguistics | 2015

UNIBA: Combining Distributional Semantic Models and Sense Distribution for Multilingual All-Words Sense Disambiguation and Entity Linking

Pierpaolo Basile; Annalina Caputo; Giovanni Semeraro

This paper describes the participation of the UNIBA team in the Task 13 of SemEval-2015 about Multilingual All-Words Sense Disambiguation and Entity Linking. We propose an algorithm able to disambiguate both word senses and named entities by combining the simple Lesk approach with information coming from both a distributional semantic model and usage frequency of meanings. The results for both English and Italian show satisfactory performance.

international conference on the theory of information retrieval | 2011

Negation for document re-ranking in ad-hoc retrieval

Pierpaolo Basile; Annalina Caputo; Giovanni Semeraro

Information about top-ranked documents plays a key role to improve retrieval performance. One of the most common strategies which exploits this kind of information is relevance feedback. Few works have investigated the role of negative feedback on retrieval performance. This is probably due to the difficulty of dealing with the concept of nonrelevant document. This paper proposes a novel approach to document re-ranking, which relies on the concept of negative feedback represented by non-relevant documents. In our model the concept of non-relevance is defined as a quantum operator in both the classical Vector Space Model and a Semantic Document Space. The latter is induced from the original document space using a distributional approach based on Random Indexing. The evaluation carried out on a standard document collection shows the effectiveness of the proposed approach and opens new perspectives to address the problem of quantifying the concept of non-relevance.

International Journal of Electronic Governance | 2017

SEPIR: a semantic and personalised information retrieval tool for the public administration based on distributional semantics

Pierpaolo Basile; Annalina Caputo; Marco Di Ciano; Gaetano Grasso; Gaetano Rossiello; Giovanni Semeraro

This paper introduces a semantic and personalised information retrieval (SEPIR) tool for the public administration of Apulia Region. SEPIR, through semantic search and visualisation tools, enables the analysis of a large amount of unstructured data and the intelligent access to information. At the core of these functionalities is an NLP pipeline responsible for the WordSpace building and the key-phrase extraction. The WordSpace is the key component of the semantic search and personalisation algorithm. Moreover, key-phrases enrich the document representation of the retrieval system and are on the basis of the bubble charts, which provide a quick overview of the main concepts involved in a document collection. We show some of the key features of SEPIR in a use case where the personalisation technique re-ranks the set of relevant documents on the basis of the users past queries and the visualisation tools provide the users with useful information about the analysed collection.

Explore More