Simone Filice
Qatar Computing Research Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Simone Filice.
meeting of the association for computational linguistics | 2015
Simone Filice; Giuseppe Castellucci; Danilo Croce; Roberto Basili
Kernel-based learning algorithms have been shown to achieve state-of-the-art results in many Natural Language Processing (NLP) tasks. We present KELP, a Java framework that supports the implementation of both kernel-based learning algorithms and kernel functions over generic data representation, e.g. vectorial data or discrete structures. The framework has been designed to decouple kernel functions and learning algorithms: once a new kernel function has been implemented it can be adopted in all the available kernelmachine algorithms. The platform includes different Online and Batch Learning algorithms for Classification, Regression and Clustering, as well as several Kernel functions, ranging from vector-based to structural kernels. This paper will show the main aspects of the framework by applying it to different NLP tasks.
north american chapter of the association for computational linguistics | 2016
Simone Filice; Danilo Croce; Alessandro Moschitti; Roberto Basili
This paper describes the KeLP system participating in the SemEval-2016 Community Question Answering (cQA) task. The challenge tasks are modeled as binary classification problems: kernel-based classifiers are trained on the SemEval datasets and their scores are used to sort the instances and produce the final ranking. All classifiers and kernels have been implemented within the Kernel-based Learning Platform called KeLP. Our primary submission ranked first in Subtask A, third in Subtask B and second in Subtask C. These ranks are based on MAP, which is the referring challenge system score. Our approach outperforms all the other systems with respect to all the other challenge metrics.
north american chapter of the association for computational linguistics | 2015
Massimo Nicosia; Simone Filice; Alberto Barrón-Cedeño; Iman Saleh; Hamdy Mubarak; Wei Gao; Preslav Nakov; Giovanni Da San Martino; Alessandro Moschitti; Kareem Darwish; Lluís Màrquez; Shafiq R. Joty; Walid Magdy
This paper describes QCRI’s participation in SemEval-2015 Task 3 “Answer Selection in Community Question Answering”, which targeted real-life Web forums, and was offered in both Arabic and English. We apply a supervised machine learning approach considering a manifold of features including among others word n-grams, text similarity, sentiment analysis, the presence of specific words, and the context of a comment. Our approach was the best performing one in the Arabic subtask and the third best in the two English subtasks.
international joint conference on natural language processing | 2015
Simone Filice; Giovanni Da San Martino; Alessandro Moschitti
This paper studies the use of structural representations for learning relations between pairs of short texts (e.g., sentences or paragraphs) of the kind: the second text answers to, or conveys exactly the same information of, or is implied by, the first text. Engineering effective features that can capture syntactic and semantic relations between the constituents composing the target text pairs is rather complex. Thus, we define syntactic and semantic structures representing the text pairs and then apply graph and tree kernels to them for automatically engineering features in Support Vector Machines. We carry out an extensive comparative analysis of stateof-the-art models for this type of relational learning. Our findings allow for achieving the highest accuracy in two different and important related tasks, i.e., Paraphrasing Identification and Textual Entailment Recognition.
empirical methods in natural language processing | 2015
Shafiq R. Joty; Alberto Barrón-Cedeño; Giovanni Da San Martino; Simone Filice; Lluís Màrquez; Alessandro Moschitti; Preslav Nakov
Community question answering, a recent evolution of question answering in the Web context, allows a user to quickly consult the opinion of a number of people on a particular topic, thus taking advantage of the wisdom of the crowd. Here we try to help the user by deciding automatically which answers are good and which are bad for a given question. In particular, we focus on exploiting the output structure at the thread level in order to make more consistent global decisions. More specifically, we exploit the relations between pairs of comments at any distance in the thread, which we incorporate in a graph-cut and in an ILP frameworks. We evaluated our approach on the benchmark dataset of SemEval-2015 Task 3. Results improved over the state of the art, confirming the importance of using thread level information.
international joint conference on natural language processing | 2015
Alberto Barrón-Cedeño; Simone Filice; Giovanni Da San Martino; Shafiq R. Joty; Lluís Màrquez; Preslav Nakov; Alessandro Moschitti
Community Question Answering (cQA) is a new application of QA in social contexts (e.g., fora). It presents new interesting challenges and research directions, e.g., exploiting the dependencies between the different comments of a thread to select the best answer for a given question. In this paper, we explored two ways of modeling such dependencies: (i) by designing specific features looking globally at the thread; and (ii) by applying structure prediction models. We trained and evaluated our models on data from SemEval-2015 Task 3 on Answer Selection in cQA. Our experiments show that: (i) the thread-level features consistently improve the performance for a variety of machine learning models, yielding state-of-the-art results; and (ii) sequential dependencies between the answer labels captured by structured prediction models are not enough to improve the results, indicating that more information is needed in the joint model.
international conference on computational linguistics | 2014
Giuseppe Castellucci; Simone Filice; Danilo Croce; Roberto Basili
In this paper, the UNITOR system participating in the SemEval-2014 Aspect Based Sentiment Analysis competition is presented. The task is tackled exploiting Kernel Methods within the Support Vector Machine framework. The Aspect Term Extraction is modeled as a sequential tagging task, tackled through SVM hmm . The Aspect Term Polarity, Aspect Category and Aspect Category Polarity detection are tackled as a classification problem where multiple kernels are linearly combined to generalize several linguistic information. In the challenge, UNITOR system achieves good results, scoring in almost all rankings between the 2 nd and the 8 th position within about 30 competitors.
european conference on information retrieval | 2014
Simone Filice; Giuseppe Castellucci; Danilo Croce; Roberto Basili
Kernel-based methods for NLP tasks have been shown to enable robust and effective learning, although their inherent complexity is manifest also in Online Learning OL scenarios, where time and memory usage grows along with the arrival of new examples. A state-of-the-art budgeted OL algorithm is here extended to efficiently integrate complex kernels by constraining the overall complexity. Principles of Fairness and Weight Adjustment are applied to mitigate imbalance in data and improve the model stability. Results in Sentiment Analysis in Twitter and Question Classification show that performances very close to the state-of-the-art achieved by batch algorithms can be obtained.
international conference on machine learning and applications | 2013
Simone Filice; Danilo Croce; Roberto Basili; Fabio Massimo Zanzotto
Online algorithms are an important class of learning machines as they are extremely simple and computationally efficient. Kernel methods versions can handle structured data, such as trees, and achieve state-of-the-art performance. However kernelized versions of Online Learning algorithms slow down when the number of support vectors becomes large. The traditional way to cope with this problem is introducing budgets that set the maximum number of support vectors. In this paper, we investigate Distributed Trees (DT) as an efficient way to use structured data in online learning. DTs effectively embed the huge feature space of the tree fragments into small vectors, so enabling the use of linear versions of kernel machines over tree structured data. We experiment with the Passive-Aggressive (PA) algorithm by comparing the linear and the kernelized version. A massive dataset made with tree structured data is employed: it is originated from a natural language processing task, the Boundary Detection in the context of Semantic Role Labeling over Frame Net. Results on a sample of the final data show that the DTs along with the Linear PA algorithm and the Tree Kernel along with the Bundgeted PA achieve comparable results in terms of f1-measure. Finally, the exploration of the full dataset allows the former to improve the performance on the classification task, with respect to the latter.
international conference on computational linguistics | 2012
Danilo Croce; Simone Filice; Roberto Basili
The representation of word meaning in texts is a central problem in Computational Linguistics. Geometrical models represent lexical semantic information in terms of the basic co-occurrences that words establish each other in large-scale text collections. As recent works already address, the definition of methods able to express the meaning of phrases or sentences as operations on lexical representations is a complex problem, and a still largely open issue. In this paper, a perspective centered on Convolution Kernels is discussed and the formulation of a Partial Tree Kernel that integrates syntactic information and lexical generalization is studied. The interaction of such information and the role of different geometrical models is investigated on the question classification task where the state-of-the-art result is achieved.