Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jose Camacho-Collados is active.

Publication


Featured researches published by Jose Camacho-Collados.


north american chapter of the association for computational linguistics | 2015

NASARI: a Novel Approach to a Semantically-Aware Representation of Items

Jose Camacho-Collados; Mohammad Taher Pilehvar; Roberto Navigli

The semantic representation of individual word senses and concepts is of fundamental importance to several applications in Natural Language Processing. To date, concept modeling techniques have in the main based their representation either on lexicographic resources, such as WordNet, or on encyclopedic resources, such as Wikipedia. We propose a vector representation technique that combines the complementary knowledge of both these types of resource. Thanks to its use of explicit semantics combined with a novel cluster-based dimensionality reduction and an effective weighting scheme, our representation attains state-of-the-art performance on multiple datasets in two standard benchmarks: word similarity and sense clustering. We are releasing our vector representations at http://lcl.uniroma1.it/nasari/.


international joint conference on natural language processing | 2015

A Unified Multilingual Semantic Representation of Concepts

Jose Camacho-Collados; Mohammad Taher Pilehvar; Roberto Navigli

Semantic representation lies at the core of several applications in Natural Language Processing. However, most existing semantic representation techniques cannot be used effectively for the representation of individual word senses. We put forward a novel multilingual concept representation, called MUFFIN, which not only enables accurate representation of word senses in different languages, but also provides multiple advantages over existing approaches. MUFFIN represents a given concept in a unified semantic space irrespective of the language of interest, enabling cross-lingual comparison of different concepts. We evaluate our approach in two different evaluation benchmarks, semantic similarity and Word Sense Disambiguation, reporting state-of-the-art performance on several standard datasets.


international joint conference on natural language processing | 2015

A Framework for the Construction of Monolingual and Cross-lingual Word Similarity Datasets

Jose Camacho-Collados; Mohammad Taher Pilehvar; Roberto Navigli

Despite being one of the most popular tasks in lexical semantics, word similarity has often been limited to the English language. Other languages, even those that are widely spoken such as Spanish, do not have a reliable word similarity evaluation framework. We put forward robust methodologies for the extension of existing English datasets to other languages, both at monolingual and cross-lingual levels. We propose an automatic standardization for the construction of cross-lingual similarity datasets, and provide an evaluation, demonstrating its reliability and robustness. Based on our procedure and taking the RG-65 word similarity dataset as a reference, we release two high-quality Spanish and Farsi (Persian) monolingual datasets, and fifteen cross-lingual datasets for six languages: English, Spanish, French, German, Portuguese, and Farsi.


workshop on evaluating vector space representations for nlp | 2016

Find the word that does not belong: A Framework for an Intrinsic Evaluation of Word Vector Representations.

Jose Camacho-Collados; Roberto Navigli

We present a new framework for an intrinsic evaluation of word vector representations based on the outlier detection task. This task is intended to test the capability of vector space models to create semantic clusters in the space. We carried out a pilot study building a gold standard dataset and the results revealed two important features: human performance on the task is extremely high compared to the standard word similarity task, and stateof-the-art word embedding models, whose current shortcomings were highlighted as part of the evaluation, still have considerable room for improvement.


empirical methods in natural language processing | 2016

Supervised Distributional Hypernym Discovery via Domain Adaptation

Luis Espinosa Anke; Jose Camacho-Collados; Claudio Delli Bovi; Horacio Saggion

Comunicacio presentada a la Conference on Empirical Methods in Natural Language Processing celebrada els dies 1 a 5 de novembre de 2016 a Austin, Texas.


meeting of the association for computational linguistics | 2017

Towards a Seamless Integration of Word Senses into Downstream NLP Applications

Mohammad Taher Pilehvar; Jose Camacho-Collados; Roberto Navigli; Nigel Collier

Lexical ambiguity can impede NLP systems from accurate understanding of semantics. Despite its potential benefits, the integration of sense-level information into NLP systems has remained understudied. By incorporating a novel disambiguation algorithm into a state-of-the-art classification model, we create a pipeline to integrate sense-level information into downstream NLP applications. We show that a simple disambiguation of the input text can lead to consistent performance improvement on multiple topic categorization and polarity detection datasets, particularly when the fine granularity of the underlying sense inventory is reduced and the document is sufficiently large. Our results also point to the need for sense representation research to focus more on in vivo evaluations which target the performance in downstream NLP applications rather than artificial benchmarks.


meeting of the association for computational linguistics | 2017

EuroSense: Automatic Harvesting of Multilingual Sense Annotations from Parallel Text

Claudio Delli Bovi; Jose Camacho-Collados; Alessandro Raganato; Roberto Navigli

Parallel corpora are widely used in a variety of Natural Language Processing tasks, from Machine Translation to cross-lingual Word Sense Disambiguation, where parallel sentences can be exploited to automatically generate high-quality sense annotations on a large scale. In this paper we present EUROSENSE, a multilingual sense-annotated resource based on the joint disambiguation of the Europarl parallel corpus, with almost 123 million sense annotations for over 155 thousand distinct concepts and entities from a languageindependent unified sense inventory. We evaluate the quality of our sense annotations intrinsically and extrinsically, showing their effectiveness as training data for Word Sense Disambiguation.


conference on computational natural language learning | 2017

Embedding Words and Senses Together via Joint Knowledge-Enhanced Training

Massimiliano Mancini; Jose Camacho-Collados; Ignacio Iacobacci; Roberto Navigli

Word embeddings are widely used in Natural Language Processing, mainly due to their success in capturing semantic information from massive corpora. However, their creation process does not allow the different meanings of a word to be automatically separated, as it conflates them into a single vector. We address this issue by proposing a new model which learns word and sense embeddings jointly. Our model exploits large corpora and knowledge from semantic networks in order to produce a unified vector space of word and sense embeddings. We evaluate the main features of our approach both qualitatively and quantitatively in a variety of tasks, highlighting the advantages of the proposed method in comparison to state-of-the-art word- and sense-based models.


Knowledge Based Systems | 2018

Applying automatic text-based detection of deceptive language to police reports: Extracting behavioral patterns from a multi-step classification model to understand how we lie to the police

Lara Quijano-Sanchez; Federico Liberatore; Jose Camacho-Collados; Miguel Camacho-Collados

VeriPol is an effective text-based lie detection model for police reports.Our model includes feature selection by L1 penalization and heuristic rules.Computational experiments on a real dataset show a validation accuracy of 91.A pilot study shows a lower bound on the empirical precision of 83%, approx.The model analysis provides linguistic insights of how people lie to the police. Filing a false police report is a crime that has dire consequences on both the individual and the system. In fact, it may be charged as a misdemeanor or a felony. For the society, a false report results in the loss of police resources and contamination of police databases used to carry out investigations and assessing the risk of crime in a territory. In this research, we present VeriPol, a model for the detection of false robbery reports based solely on their text. This tool, developed in collaboration with the Spanish National Police, combines Natural Language Processing and Machine Learning methods in a decision support system that provides police officers the probability that a given report is false. VeriPol has been tested on more than 1000 reports from 2015 provided by the Spanish National Police. Empirical results show that it is extremely effective in discriminating between false and true reports with a success rate of more than 91%, improving by more than 15% the accuracy of expert police officers on the same dataset. The underlying classification model can be analysed to extract patterns and insights showing how people lie to the police (as well as how to get away with false reporting). In general, the more details provided in the report, the more likely it is to be honest. Finally, a pilot study carried out in June 2017 has demonstrated the usefulness of VeriPol on the field.


Knowledge Based Systems | 2018

Knowledge-enhanced document embeddings for text classification

Roberta Akemi Sinoara; Jose Camacho-Collados; Rafael Geraldeli Rossi; Roberto Navigli; Solange Oliveira Rezende

Abstract Accurate semantic representation models are essential in text mining applications. For a successful application of the text mining process, the text representation adopted must keep the interesting patterns to be discovered. Although competitive results for automatic text classification may be achieved with traditional bag of words, such representation model cannot provide satisfactory classification performances on hard settings where richer text representations are required. In this paper, we present an approach to represent document collections based on embedded representations of words and word senses. We bring together the power of word sense disambiguation and the semantic richness of word- and word-sense embedded vectors to construct embedded representations of document collections. Our approach results in semantically enhanced and low-dimensional representations. We overcome the lack of interpretability of embedded vectors, which is a drawback of this kind of representation, with the use of word sense embedded vectors. Moreover, the experimental evaluation indicates that the use of the proposed representations provides stable classifiers with strong quantitative results, especially in semantically-complex classification scenarios.

Collaboration


Dive into the Jose Camacho-Collados's collaboration.

Top Co-Authors

Avatar

Roberto Navigli

Sapienza University of Rome

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Claudio Delli Bovi

Sapienza University of Rome

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ignacio Iacobacci

Sapienza University of Rome

View shared research outputs
Top Co-Authors

Avatar

Tommaso Pasini

Sapienza University of Rome

View shared research outputs
Researchain Logo
Decentralizing Knowledge