Giuseppe Castellucci | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Giuseppe Castellucci is active.

Explore More

Publication

Featured researches published by Giuseppe Castellucci.

meeting of the association for computational linguistics | 2015

KeLP: a Kernel-based Learning Platform for Natural Language Processing

Simone Filice; Giuseppe Castellucci; Danilo Croce; Roberto Basili

Kernel-based learning algorithms have been shown to achieve state-of-the-art results in many Natural Language Processing (NLP) tasks. We present KELP, a Java framework that supports the implementation of both kernel-based learning algorithms and kernel functions over generic data representation, e.g. vectorial data or discrete structures. The framework has been designed to decouple kernel functions and learning algorithms: once a new kernel function has been implemented it can be adopted in all the available kernelmachine algorithms. The platform includes different Online and Batch Learning algorithms for Classification, Regression and Clustering, as well as several Kernel functions, ranging from vector-based to structural kernels. This paper will show the main aspects of the framework by applying it to different NLP tasks.

Intelligenza Artificiale | 2012

Structured learning for semantic role labeling

Danilo Croce; Giuseppe Castellucci; Emanuele Bastianelli

The use of complex grammatical features in statistical language learning assumes the availability of large scale training data and good quality parsers, especially for language different from English. In this paper, we show how good quality FrameNet SRL systems can be obtained, without relying on full syntactic parsing, by backing off to surface grammatical representations and structured learning. This model is here shown to achieve state-of-art results in standard benchmarks, while its robustness is confirmed in poor training conditions, for a language different for English, i.e. Italian. 1 Linguistic Features for Inductive Tasks Language learning systems usually generalize linguistic observations into statistical models of higher level semantic tasks, such as Semantic Role Labeling (SRL). Statistical learning methods assume that lexical or grammatical aspects of training data are the basic features for modeling the different inferences. They are then generalized into predictive patterns composing the final induced model. Lexical information captures semantic information and fine grained context dependent aspects of the input data. However, it is largely affected by data sparseness as lexical evidence is often poorly represented in training. It is also difficult to be generalized and non scalable, as the development large scale lexical KBs is very expensive. Moreover, other crucial properties, such as word ordering, are neglected by lexical representations, as syntax must be also properly addressed. In semantic role labeling, the role of grammatical features has been outlined since the seminal work by [6]. Symbolic expressions derived from the parse trees denote the position and the relationship between an argument and its predicate, and they are used as features. Parse tree paths are such features, employed in [11] for semantic role labeling. Tree kernels, introduced by [4], model similarity between two training examples as a function of the shared parts of their parse trees. Applied to different tasks, from parsing [4] to semantic role labeling [16], tree kernels determine expressive representations for effective grammatical feature engineering. However, there is no free lunch in the adoption of lexical and grammatical features in complex NLP tasks. First, lexical information is hard to be properly generalized whenever the amount of training data is small. Large scale general-purpose lexicons are available, but their employment in specific tasks is not satisfactory: coverage in domain (or corpus)-specific tasks is often poor and domain adaptation is difficult. For R. Pirrone and F. Sorbello (Eds.): AI*IA 2011, LNAI 6934, pp. 238–249, 2011. c

european conference on artificial intelligence | 2014

Effective and robust natural language understanding for human-robot interaction

Emanuele Bastianelli; Giuseppe Castellucci; Danilo Croce; Roberto Basili; Daniele Nardi

Robots are slowly becoming part of everyday life, as they are being marketed for commercial applications (viz. telepresence, cleaning or entertainment). Thus, the ability to interact with non-expert users is becoming a key requirement. Even if user utterances can be efficiently recognized and transcribed by Automatic Speech Recognition systems, several issues arise in translating them into suitable robotic actions. In this paper, we will discuss both approaches providing two existing Natural Language Understanding workflows for Human Robot Interaction. First, we discuss a grammar based approach: it is based on grammars thus recognizing a restricted set of commands. Then, a data driven approach, based on a free-from speech recognizer and a statistical semantic parser, is discussed. The main advantages of both approaches are discussed, also from an engineering perspective, i.e. considering the effort of realizing HRI systems, as well as their reusability and robustness. An empirical evaluation of the proposed approaches is carried out on several datasets, in order to understand performances and identify possible improvements towards the design of NLP components in HRI.

congress of the italian association for artificial intelligence | 2013

Kernel-Based Discriminative Re-ranking for Spoken Command Understanding in HRI

Roberto Basili; Emanuele Bastianelli; Giuseppe Castellucci; Daniele Nardi; Vittorio Perera

Speech recognition is being addressed as one of the key technologies for a natural interaction with robots, that are targeting in the consumer market. However, speech recognition in human-robot interaction is typically affected by noisy conditions of the operational environment, that impact on the performance of the recognition of spoken commands. Consequently, finite-state grammars or statistical language models even though they can be tailored to the target domain exhibit high rate of false positives or low accuracy. In this paper, a discriminative re-ranking method is applied to a simple speech and language processing cascade, based on off-the-shelf components in realistic conditions. Tree kernels are here applied to improve the accuracy of the recognition process by re-ranking the n-best list returned by the speech recognition component. The rationale behind our approach is to reduce the effort for devising domain dependent solutions in the design of speech interfaces for language processing in human-robot interactions.

applications of natural language to data bases | 2015

Acquiring a Large Scale Polarity Lexicon Through Unsupervised Distributional Methods

Giuseppe Castellucci; Danilo Croce; Roberto Basili

The recent interests in Sentiment Analysis systems brought the attention on the definition of effective methods to detect opinions and sentiments in texts with a good accuracy. Many approaches that can be found in literature are based on hand-coded resources that model the prior polarity of words or multi-word expressions. The construction of such resources is in general expensive and coverage issues arise with respect to the multiplicity of linguistic phenomena of sentiment expressions. This paper presents an automatic method for deriving a large-scale polarity lexicon based on Distributional Models of lexical semantics. Given a set of sentences annotated with polarity, we transfer the sentiment information from sentences to words. The set of annotated examples is derived from Twitter and the polarity assignment to sentences is derived by simple heuristics. The approach is mostly unsupervised, and the experimental evaluation carried out on two Sentiment Analysis tasks shows the benefits of the generated resource.

international conference on computational linguistics | 2014

UNITOR: Aspect Based Sentiment Analysis with Structured Learning

Giuseppe Castellucci; Simone Filice; Danilo Croce; Roberto Basili

In this paper, the UNITOR system participating in the SemEval-2014 Aspect Based Sentiment Analysis competition is presented. The task is tackled exploiting Kernel Methods within the Support Vector Machine framework. The Aspect Term Extraction is modeled as a sequential tagging task, tackled through SVM hmm . The Aspect Term Polarity, Aspect Category and Aspect Category Polarity detection are tackled as a classification problem where multiple kernels are linearly combined to generalize several linguistic information. In the challenge, UNITOR system achieves good results, scoring in almost all rankings between the 2 nd and the 8 th position within about 30 competitors.

european conference on information retrieval | 2014

Effective Kernelized Online Learning in Language Processing Tasks

Simone Filice; Giuseppe Castellucci; Danilo Croce; Roberto Basili

Kernel-based methods for NLP tasks have been shown to enable robust and effective learning, although their inherent complexity is manifest also in Online Learning OL scenarios, where time and memory usage grows along with the arrival of new examples. A state-of-the-art budgeted OL algorithm is here extended to efficiently integrate complex kernels by constraining the overall complexity. Principles of Fairness and Weight Adjustment are applied to mitigate imbalance in data and improve the model stability. Results in Sentiment Analysis in Twitter and Question Classification show that performances very close to the state-of-the-art achieved by batch algorithms can be obtained.

Lecture Notes in Computer Science | 2016

Spoken Language Understanding for Service Robotics in Italian

Andrea Vanzo; Danilo Croce; Giuseppe Castellucci; Roberto Basili; Daniele Nardi

Robots operate in specific environments and the correct interpretation of linguistic interactions depends on physical, cognitive and language-dependent aspects triggered by the environment. In this work, we describe a Spoken Language Understanding chain for the semantic parsing of robotic commands, designed according to a Client/Server architecture. This work also reports a first evaluation of the proposed architecture in the automatic interpretation of commands expressed in Italian for a robot in a Service Robotics domain. The experimental results show that the proposed solution can be easily extended to other languages for a robust Spoken Language Understanding in Human-Robot Interaction.

robot soccer world cup | 2014

RoboCup@Home spoken corpus: Using robotic competitions for gathering datasets

Emanuele Bastianelli; Luca Iocchi; Daniele Nardi; Giuseppe Castellucci; Danilo Croce; Roberto Basili

The definition of high quality datasets for benchmarking single components and entire systems in intelligent robots is a fundamental task for developing, testing and comparing different technical solutions. In this paper, we describe the methodology adopted for the acquisition and the creation of a spoken corpus for domestic and service robots. The corpus has been inspired by and acquired in the RoboCup@Home setting, with the involvement of RoboCup@Home participants. The annotated data set is publicly available for developing, testing and comparing speech understanding functionalities of domestic and service robots, not only for teams involved in RoboCup@Home or in other competitions, but also for research groups active in the field. We regard the construction of the dataset as a first step towards a full benchmarking methodology for spoken language interaction in service robotics.

International Workshop on Evaluation of Natural Language and Speech Tool for Italian | 2013

Structured Kernel-Based Learning for the Frame Labeling over Italian Texts

Danilo Croce; Emanuele Bastianelli; Giuseppe Castellucci

In this paper two systems participating to the Evalita Frame Labeling over Italian Texts challenge are presented. The first one, i.e. the SVM-SPTK system, implements the Smoothed Partial Tree Kernel that models semantic roles by implicitly combining syntactic and lexical information of annotated examples. The second one, i.e. the SVM-HMM system, realizes a flexible approach based on the Markovian formulation of the SVM learning algorithm. In the challenge, the SVM-SPTK system obtains state-of-the-art results in almost all tasks. Performances of the SVM-HMM system are interesting too, i.e. the second best scores in the Frame Prediction and Argument Classification tasks, especially considering it does not rely on a full syntactic parsing.

Explore More