Leonardo Rigutini | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Leonardo Rigutini is active.

Explore More

Publication

Featured researches published by Leonardo Rigutini.

web intelligence | 2005

An EM Based Training Algorithm for Cross-Language Text Categorization

Leonardo Rigutini; Marco Maggini; Bing Liu

Due to the globalization on the Web, many companies and institutions need to efficiently organize and search repositories containing multilingual documents. The management of these heterogeneous text collections increases the costs significantly because experts of different languages are required to organize these collections. Cross-language text categorization can provide techniques to extend existing automatic classification systems in one language to new languages without requiring additional intervention of human experts. In this paper, we propose a learning algorithm based on the EM scheme which can be used to train text classifiers in a multilingual environment. In particular, in the proposed approach, we assume that a predefined category set and a collection of labeled training data is available for a given language L/sub 1/. A classifier for a different language L/sub 2/ is trained by translating the available labeled training set for L/sub 1/ to L/sub 2/ and by using an additional set of unlabeled documents from L/sub 2/. This technique allows us to extract correct statistical properties of the language L/sub 2/ which are not completely available in automatically translated examples, because of the different characteristics of language L/sub 1/ and of the approximation of the translation process. Our experimental results show that the performance of the proposed method is very promising when applied on a test document set extracted from newsgroups in English and Italian.

Machine Learning | 2012

Bridging logic and kernel machines

Michelangelo Diligenti; Marco Gori; Marco Maggini; Leonardo Rigutini

We propose a general framework to incorporate first-order logic (FOL) clauses, that are thought of as an abstract and partial representation of the environment, into kernel machines that learn within a semi-supervised scheme. We rely on a multi-task learning scheme where each task is associated with a unary predicate defined on the feature space, while higher level abstract representations consist of FOL clauses made of those predicates. We re-use the kernel machine mathematical apparatus to solve the problem as primal optimization of a function composed of the loss on the supervised examples, the regularization term, and a penalty term deriving from forcing real-valued constraints deriving from the predicates. Unlike for classic kernel machines, however, depending on the logic clauses, the overall function to be optimized is not convex anymore. An important contribution is to show that while tackling the optimization by classic numerical schemes is likely to be hopeless, a stage-based learning scheme, in which we start learning the supervised examples until convergence is reached, and then continue by forcing the logic clauses is a viable direction to attack the problem. Some promising experimental results are given on artificial learning tasks and on the automatic tagging of bibtex entries to emphasize the comparison with plain kernel machines.

web intelligence | 2005

A Semi-Supervised Document Clustering Algorithm Based on EM

Leonardo Rigutini; Marco Maggini

Document clustering is a very hard task in automatic text processing since it requires extracting regular patterns from a document collection without a priori knowledge on the category structure. This task can be difficult also for humans because many different but valid partitions may exist for the same collection. Moreover, the lack of information about categories makes it difficult to apply effective feature selection techniques to reduce the noise in the representation of texts. Despite these intrinsic difficulties, text clustering is an important task for Web search applications in which huge collections or quite long query result lists must be automatically organized. Semi-supervised clustering lies in between automatic categorization and auto-organization. It is assumed that the supervisor is not required to specify a set of classes, but only to provide a set of texts grouped by the criteria to be used, to organize the collection. In this paper, we present a novel algorithm for clustering text documents which exploits the EM algorithm together with a feature selection technique based on information gain. The experimental results show that only very few documents are needed to initialize the clusters and that the algorithm is able to properly extract the regularities hidden in a huge unlabeled collection.

web intelligence | 2004

Pseudo-Supervised Clustering for Text Documents

Marco Maggini; Leonardo Rigutini; Marco Turchi

Effective solutions for Web search engines can take advantage of algorithms for the automatic organization of documents into homogeneous clusters. Unfortunately, document clustering is not an easy task especially when the documents share a common set of topics, like in vertical search engines. In this paper we propose two clustering algorithms which can be tuned by the feedback of an expert. The feedback is used to choose an appropriate basis for the representation of documents, while the clustering is performed in the projected space. The algorithms are evaluated on a dataset containing papers from computer science conferences. The results show that an appropriate choice of the representation basis can yield better performance with respect to the original vector space model.

international conference on artificial neural networks | 2009

A Maximum-Likelihood Connectionist Model for Unsupervised Learning over Graphical Domains

Edmondo Trentin; Leonardo Rigutini

Supervised relational learning over labeled graphs, e.g. via recursive neural nets, received considerable attention from the connectionist community. Surprisingly, with the exception of recursive self organizing maps, unsupervised paradigms have been far less investigated. In particular, no algorithms for density estimation over graphs are found in the literature. This paper introduces first a formal notion of probability density function (pdf) over graphical spaces. It then proposes a maximum-likelihood pdf estimation technique, relying on the joint optimization of a recursive encoding network and a constrained radial basis functions-like net. Preliminary experiments on synthetically generated samples of labeled graphs are analyzed and tested statistically.

international conference on machine learning and applications | 2008

A Fully Automatic Crossword Generator

Leonardo Rigutini; Michelangelo Diligenti; Marco Maggini; Marco Gori

This paper presents a software system that is able to generate crosswords with no human intervention including definition generation and crossword compilation. In particular, the proposed system crawls relevant sources of the Web, extracts definitions from the downloaded pages using state-of-the-art natural language processing (NLP) techniques and, finally, attempts at compiling a crossword schema with the extracted definitions using a constrain satisfaction programming (CSP) solver. The crossword generator has relevant applications in entertainment, educational and rehabilitation contexts.

International Journal on Artificial Intelligence Tools | 2012

Automatic Generation of Crossword Puzzles

Leonardo Rigutini; Michelangelo Diligenti; Marco Maggini; Marco Gori

Crossword puzzles are used everyday by millions of people for entertainment, but have applications also in educational and rehabilitation contexts. Unfortunately, the generation of ad-hoc puzzles, especially on specific subjects, typically requires a great deal of human expert work. This paper presents the architecture of WebCrow-generation, a system that is able to generate crosswords with no human intervention, including clue generation and crossword compilation. In particular, the proposed system crawls information sources on the Web, extracts definitions from the downloaded pages using state-of-the-art natural language processing techniques and, finally, compiles the crossword schema with the extracted definitions by constraint satisfaction programming. The system has been tested on the creation of Italian crosswords, but the extensive use of machine learning makes the system easily portable to other languages.

web intelligence | 2006

Semantic Labeling of Data by Using the Web

Leonardo Rigutini; Ernesto Di Iorio; Marco Ernandes; Marco Maggini

This paper proposes a system for automatically categorizing terms or lexical entities into a predefined set of semantic domains. We present an approach that exploits the knowledge available in the Web to create a model of each term or entity (entity context lexicons - ECLs). Each profile is simply a list of terms (similar to the bag-of-words representation in text categorization) and it is composed primarily by the words often appearing in the same contexts of the entity. These profiles model the contexts in which the entity usually appears and they can be subsequently processed by an automatic classifier. Moreover, we propose and validate a profile-based categorization model developed for this particular task which uses the ECLs of the training entities to build a profile for each class (class context lexicon - CCL). Finally, we propose a technique for dealing with multi-label classification based on a decision module that exploits a neural network. We show the effectiveness of the proposed approach on a term categorization task using a standard benchmark composed of a set of domain-specific lexicons (WordNetDomains)

IEEE Transactions on Neural Networks | 2011