Michelangelo Diligenti

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michelangelo Diligenti is active.

Explore More

Publication

Featured researches published by Michelangelo Diligenti.

IEEE Transactions on Knowledge and Data Engineering | 2004

A unified probabilistic framework for Web page scoring systems

Michelangelo Diligenti; Marco Gori; Marco Maggini

The definition of efficient page ranking algorithms is becoming an important issue in the design of the query interface of Web search engines. Information flooding is a common experience especially when broad topic queries are issued. Queries containing only one or two keywords usually match a huge number of documents, while users can only afford to visit the first positions of the returned list, which do not necessarily refer to the most appropriate answers. Some successful approaches to page ranking in a hyperlinked environment, like the Web, are based on link analysis. We propose a general probabilistic framework for Web page scoring systems (WPSS), which incorporates and extends many of the relevant models proposed in the literature. In particular, we introduce scoring systems for both generic (horizontal) and focused (vertical) search engines. Whereas horizontal scoring algorithms are only based on the topology of the Web graph, vertical ranking also takes the page contents into account and are the base for focused and user adapted search interfaces. Experimental results are reported to show the properties of some of the proposed scoring systems with special emphasis on vertical search.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2003

Hidden tree Markov models for document image classification

Michelangelo Diligenti; Paolo Frasconi; Marco Gori

Classification is an important problem in image document processing and is often a preliminary step toward recognition, understanding, and information extraction. In this paper, the problem is formulated in the framework of concept learning and each category corresponds to the set of image documents with similar physical structure. We propose a solution based on two algorithmic ideas. First, we obtain a structured representation of images based on labeled XY-trees (this representation informs the learner about important relationships between image subconstituents). Second, we propose a probabilistic architecture that extends hidden Markov models for learning probability distributions defined on spaces of labeled trees. Finally, a successful application of this method to the categorization of commercial invoices is presented.

international world wide web conferences | 2002

Web page scoring systems for horizontal and vertical search

Michelangelo Diligenti; Marco Gori; Marco Maggini

Page ranking is a fundamental step towards the construction of effective search engines for both generic (horizontal) and focused (vertical) search. Ranking schemes for horizontal search like the PageRank algorithm used by Google operate on the topology of the graph, regardless of the page content. On the other hand, the recent development of vertical portals (vortals) makes it useful to adopt scoring systems focussed on the topic and taking the page content into account.In this paper, we propose a general framework for Web Page Scoring Systems (WPSS) which incorporates and extends many of the relevant models proposed in the literature. Finally, experimental results are given to assess the features of the proposed scoring systems with special emphasis on vertical search.

International Journal on Document Analysis and Recognition | 2001

Automatic document classification and indexing in high-volume applications

Enrico Appiani; Francesca Cesarini; Anna Maria Colla; Michelangelo Diligenti; Marco Gori; Simone Marinai; Giovanni Soda

Abstract. In this paper a system for analysis and automatic indexing of imaged documents for high-volume applications is described. This system, named STRETCH (STorage and RETrieval by Content of imaged documents), is based on an Archiving and Retrieval Engine, which overcomes the bottleneck of document profiling bypassing some limitations of existing pre-defined indexing schemes. The engine exploits a structured document representation and can activate appropriate methods to characterise and automatically index heterogeneous documents with variable layout. The originality of STRETCH lies principally in the possibility for unskilled users to define the indexes relevant to the document domains of their interest by simply presenting visual examples and applying reliable automatic information extraction methods (document classification, flexible reading strategies) to index the documents automatically, thus creating archives as desired. STRETCH offers ease of use and application programming and the ability to dynamically adapt to new types of documents. The system has been tested in two applications in particular, one concerning passive invoices and the other bank documents. In these applications, several classes of documents are involved. The indexing strategy first automatically classifies the document, thus avoiding pre-sorting, then locates and reads the information pertaining to the specific document class. Experimental results are encouraging overall; in particular, document classification results fulfill the requirements of high-volume application. Integration into production lines is under execution.

Machine Learning | 2012

Bridging logic and kernel machines

Michelangelo Diligenti; Marco Gori; Marco Maggini; Leonardo Rigutini

We propose a general framework to incorporate first-order logic (FOL) clauses, that are thought of as an abstract and partial representation of the environment, into kernel machines that learn within a semi-supervised scheme. We rely on a multi-task learning scheme where each task is associated with a unary predicate defined on the feature space, while higher level abstract representations consist of FOL clauses made of those predicates. We re-use the kernel machine mathematical apparatus to solve the problem as primal optimization of a function composed of the loss on the supervised examples, the regularization term, and a penalty term deriving from forcing real-valued constraints deriving from the predicates. Unlike for classic kernel machines, however, depending on the logic clauses, the overall function to be optimized is not convex anymore. An important contribution is to show that while tackling the optimization by classic numerical schemes is likely to be hopeless, a stage-based learning scheme, in which we start learning the supervised examples until convergence is reached, and then continue by forcing the logic clauses is a viable direction to attack the problem. Some promising experimental results are given on artificial learning tasks and on the automatic tagging of bibtex entries to emphasize the comparison with plain kernel machines.

Pattern Recognition Letters | 2003

Similarity learning for graph-based image representations

Ciro de Mauro; Michelangelo Diligenti; Marco Gori; Marco Maggini

Visual database engines are usually based on predefined criteria for retrieving the images in response to a given query. In this paper, we propose a novel approach based on neural networks by which the retrieval criterion is derived on the basis of learning from examples. In particular, the proposed approach uses a graph-based image representation that denotes the relationships among regions in the image and on recursive neural networks which can process directed ordered acyclic graphs. The graph-based representation combines structural and subsymbolic features of the image, while recursive neural networks can discover the optimal representation for searching the image database. A set of preliminary experiments on artificial images clearly indicate that the proposed approach is very promising.

Pattern Recognition | 2001

Adaptive graphical pattern recognition for the classification of company logos

Michelangelo Diligenti; Marco Gori; Marco Maggini; Enrico Martinelli

Abstract When dealing with a pattern recognition task two major issues must be faced: firstly, a feature extraction technique has to be applied to extract useful representations of the objects to be recognized; secondly, a classification algorithm must be devised in order to produce a class hypothesis once a pattern representation is given. Adaptive graphical pattern recognition is proposed as a new approach to face these two issues when neither a purely symbolic nor a purely sub-symbolic representation seems adequate for the patterns. This approach is based on appropriate structured representations of patterns which are, subsequently, processed by recursive neural networks, that can be trained to perform the given classification task using connectionist-based learning algorithms. In the proposed framework, the joint role of the structured representation and learning makes it possible to face tasks in which input patterns are affected by many different sources of noise. We report some results that show how the proposed scheme can produce a very promising performance for the classification of company logos corrupted by noise.

international conference on document analysis and recognition | 2001

Classification of HTML documents by Hidden Tree-Markov Models

Michelangelo Diligenti; Marco Gori; Marco Maggini; Franco Scarselli

Content-based search and organization of Web documents poses new issues in information retrieval. We propose a novel approach for the classification of HTML documents based on a structured representation of their contents which are split into logical contexts (paragraphs, sections, anchors, etc.). The classification is performed using Hidden Tree-Markov Models (HTMMs), an extension of Hidden Markov Models for processing structured objects. We report some promising experimental results showing that the use of the structured representation improves the classification accuracy in most of the cases.

Artificial Intelligence | 2017

Semantic-based regularization for learning and inference

Michelangelo Diligenti; Marco Gori; Claudio Saccà

Abstract This paper proposes a unified approach to learning from constraints, which integrates the ability of classical machine learning techniques to learn from continuous feature-based representations with the ability of reasoning using higher-level semantic knowledge typical of Statistical Relational Learning. Learning tasks are modeled in the general framework of multi-objective optimization, where a set of constraints must be satisfied in addition to the traditional smoothness regularization term. The constraints translate First Order Logic formulas, which can express learning-from-example supervisions and general prior knowledge about the environment by using fuzzy logic. By enforcing the constraints also on the test set, this paper presents a natural extension of the framework to perform collective classification. Interestingly, the theory holds for both the case of data represented by feature vectors and the case of data simply expressed by pattern identifiers, thus extending classic kernel machines and graph regularization, respectively. This paper also proposes a probabilistic interpretation of the proposed learning scheme, and highlights intriguing connections with probabilistic approaches like Markov Logic Networks. Experimental results on classic benchmarks provide clear evidence of the remarkable improvements that are obtained with respect to related approaches.

BMC Bioinformatics | 2014

Improved multi-level protein–protein interaction prediction with semantic-based regularization

Claudio Saccà; Stefano Teso; Michelangelo Diligenti; Andrea Passerini

BackgroundProtein–protein interactions can be seen as a hierarchical process occurring at three related levels: proteins bind by means of specific domains, which in turn form interfaces through patches of residues. Detailed knowledge about which domains and residues are involved in a given interaction has extensive applications to biology, including better understanding of the binding process and more efficient drug/enzyme design. Alas, most current interaction prediction methods do not identify which parts of a protein actually instantiate an interaction. Furthermore, they also fail to leverage the hierarchical nature of the problem, ignoring otherwise useful information available at the lower levels; when they do, they do not generate predictions that are guaranteed to be consistent between levels.ResultsInspired by earlier ideas of Yip et al. (BMC Bioinformatics 10:241, 2009), in the present paper we view the problem as a multi-level learning task, with one task per level (proteins, domains and residues), and propose a machine learning method that collectively infers the binding state of all object pairs. Our method is based on Semantic Based Regularization (SBR), a flexible and theoretically sound machine learning framework that uses First Order Logic constraints to tie the learning tasks together. We introduce a set of biologically motivated rules that enforce consistent predictions between the hierarchy levels.ConclusionsWe study the empirical performance of our method using a standard validation procedure, and compare its performance against the only other existing multi-level prediction technique. We present results showing that our method substantially outperforms the competitor in several experimental settings, indicating that exploiting the hierarchical nature of the problem can lead to better predictions. In addition, our method is also guaranteed to produce interactions that are consistent with respect to the protein–domain–residue hierarchy.

Explore More