Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Alessandro Vullo is active.

Publication


Featured researches published by Alessandro Vullo.


Nucleic Acids Research | 2006

DISULFIND: a disulfide bonding state and cysteine connectivity prediction server

Alessio Ceroni; Andrea Passerini; Alessandro Vullo; Paolo Frasconi

DISULFIND is a server for predicting the disulfide bonding state of cysteines and their disulfide connectivity starting from sequence alone. Optionally, disulfide connectivity can be predicted from sequence and a bonding state assignment given as input. The output is a simple visualization of the assigned bonding state (with confidence degrees) and the most likely connectivity patterns. The server is available at .


Bioinformatics | 2004

Disulfide connectivity prediction using recursive neural networks and evolutionary information

Alessandro Vullo; Paolo Frasconi

MOTIVATION We focus on the prediction of disulfide bridges in proteins starting from their amino acid sequence and from the knowledge of the disulfide bonding state of each cysteine. The location of disulfide bridges is a structural feature that conveys important information about the protein main chain conformation and can therefore help towards the solution of the folding problem. Existing approaches based on weighted graph matching algorithms do not take advantage of evolutionary information. Recursive neural networks (RNN), on the other hand, can handle in a natural way complex data structures such as graphs whose vertices are labeled by real vectors, allowing us to incorporate multiple alignment profiles in the graphical representation of disulfide connectivity patterns. RESULTS The core of the method is the use of machine learning tools to rank alternative disulfide connectivity patterns. We develop an ad-hoc RNN architecture for scoring labeled undirected graphs that represent connectivity patterns. In order to compare our algorithm with previous methods, we report experimental results on the SWISS-PROT 39 dataset. We find that using multiple alignment profiles allows us to obtain significant prediction accuracy improvements, clearly demonstrating the important role played by evolutionary information. AVAILABILITY The Web interface of the predictor is available at http://neural.dsi.unifi.it/cysteines


Nucleic Acids Research | 2006

Spritz: a server for the prediction of intrinsically disordered regions in protein sequences using kernel machines

Alessandro Vullo; Oscar Bortolami; Gianluca Pollastri

Intrinsically disordered proteins have long stretches of their polypeptide chain, which do not adopt a single native structure composed of stable secondary and tertiary structure in the absence of binding partners. The prediction of intrinsically disordered regions in proteins from sequence is increasingly becoming of interest, as the presence of many such regions in the complete genome sequences are discovered and important functional roles are associated with them. We have developed a machine learning approach based on two support vector machines (SVM) to discriminate disordered regions from sequence. The SVM are trained and benchmarked on two sets, representing long and short disordered regions. A preliminary version of Spritz was shown to perform consistently well at the recent biannual CASP-6 experiment [Critical Assessment of Techniques for Protein Structure Prediction (CASP), 2004]. The fully developed Spritz method is freely available as a web server at and .


BMC Bioinformatics | 2007

Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information

Gianluca Pollastri; Alberto J. M. Martin; Catherine Mooney; Alessandro Vullo

BackgroundStructural properties of proteins such as secondary structure and solvent accessibility contribute to three-dimensional structure prediction, not only in the ab initio case but also when homology information to known structures is available. Structural properties are also routinely used in protein analysis even when homology is available, largely because homology modelling is lower throughput than, say, secondary structure prediction. Nonetheless, predictors of secondary structure and solvent accessibility are virtually always ab initio.ResultsHere we develop high-throughput machine learning systems for the prediction of protein secondary structure and solvent accessibility that exploit homology to proteins of known structure, where available, in the form of simple structural frequency profiles extracted from sets of PDB templates. We compare these systems to their state-of-the-art ab initio counterparts, and with a number of baselines in which secondary structures and solvent accessibilities are extracted directly from the templates. We show that structural information from templates greatly improves secondary structure and solvent accessibility prediction quality, and that, on average, the systems significantly enrich the information contained in the templates. For sequence similarity exceeding 30%, secondary structure prediction quality is approximately 90%, close to its theoretical maximum, and 2-class solvent accessibility roughly 85%. Gains are robust with respect to template selection noise, and significant for marginal sequence similarity and for short alignments, supporting the claim that these improved predictions may prove beneficial beyond the case in which clear homology is available.ConclusionThe predictive system are publicly available at the address http://distill.ucd.ie.


BMC Bioinformatics | 2006

A two-stage approach for improved prediction of residue contact maps

Alessandro Vullo; Ian Walsh; Gianluca Pollastri

BackgroundProtein topology representations such as residue contact maps are an important intermediate step towards ab initio prediction of protein structure. Although improvements have occurred over the last years, the problem of accurately predicting residue contact maps from primary sequences is still largely unsolved. Among the reasons for this are the unbalanced nature of the problem (with far fewer examples of contacts than non-contacts), the formidable challenge of capturing long-range interactions in the maps, the intrinsic difficulty of mapping one-dimensional input sequences into two-dimensional output maps.In order to alleviate these problems and achieve improved contact map predictions, in this paper we split the task into two stages: the prediction of a maps principal eigenvector (PE) from the primary sequence; the reconstruction of the contact map from the PE and primary sequence. Predicting the PE from the primary sequence consists in mapping a vector into a vector. This task is less complex than mapping vectors directly into two-dimensional matrices since the size of the problem is drastically reduced and so is the scale length of interactions that need to be learned.ResultsWe develop architectures composed of ensembles of two-layered bidirectional recurrent neural networks to classify the components of the PE in 2, 3 and 4 classes from protein primary sequence, predicted secondary structure, and hydrophobicity interaction scales. Our predictor, tested on a non redundant set of 2171 proteins, achieves classification performances of up to 72.6%, 16% above a base-line statistical predictor.We design a system for the prediction of contact maps from the predicted PE. Our results show that predicting maps through the PE yields sizeable gains especially for long-range contacts which are particularly critical for accurate protein 3D reconstruction. The final predictors accuracy on a non-redundant set of 327 targets is 35.4% and 19.8% for minimum contact separations of 12 and 24, respectively, when the top length/5 contacts are selected. On the 11 CASP6 Novel Fold targets we achieve similar accuracies (36.5% and 19.7%). This favourably compares with the best automated predictors at CASP6.ConclusionOur final system for contact map prediction achieves state-of-the-art performances, and may provide valuable constraints for improved ab initio prediction of protein structures. A suite of predictors of structural features, including the PE, and PE-based contact maps, is available at http://distill.ucd.ie.


BMC Bioinformatics | 2006

Distill: a suite of web servers for the prediction of one-, two- and three-dimensional structural features of proteins

Davide Baù; Alberto J. M. Martin; Catherine Mooney; Alessandro Vullo; Ian Walsh; Gianluca Pollastri

BackgroundWe describe Distill, a suite of servers for the prediction of protein structural features: secondary structure; relative solvent accessibility; contact density; backbone structural motifs; residue contact maps at 6, 8 and 12 Angstrom; coarse protein topology. The servers are based on large-scale ensembles of recursive neural networks and trained on large, up-to-date, non-redundant subsets of the Protein Data Bank. Together with structural feature predictions, Distill includes a server for prediction of Cαtraces for short proteins (up to 200 amino acids).ResultsThe servers are state-of-the-art, with secondary structure predicted correctly for nearly 80% of residues (currently the top performance on EVA), 2-class solvent accessibility nearly 80% correct, and contact maps exceeding 50% precision on the top non-diagonal contacts. A preliminary implementation of the predictor of protein Cαtraces featured among the top 20 Novel Fold predictors at the last CASP6 experiment as group Distill (ID 0348). The majority of the servers, including the Cαtrace predictor, now take into account homology information from the PDB, when available, resulting in greatly improved reliability.ConclusionAll predictions are freely available through a simple joint web interface and the results are returned by email. In a single submission the user can send protein sequences for a total of up to 32k residues to all or a selection of the servers. Distill is accessible at the address: http://distill.ucd.ie/distill/.


Nucleic Acids Research | 2011

CSpritz: accurate prediction of protein disorder segments with annotation for homology, secondary structure and linear motifs

Ian Walsh; Alberto J. M. Martin; Tomás Di Domenico; Alessandro Vullo; Gianluca Pollastri

CSpritz is a web server for the prediction of intrinsic protein disorder. It is a combination of previous Spritz with two novel orthogonal systems developed by our group (Punch and ESpritz). Punch is based on sequence and structural templates trained with support vector machines. ESpritz is an efficient single sequence method based on bidirectional recursive neural networks. Spritz was extended to filter predictions based on structural homologues. After extensive testing, predictions are combined by averaging their probabilities. The CSpritz website can elaborate single or multiple predictions for either short or long disorder. The server provides a global output page, for download and simultaneous statistics of all predictions. Links are provided to each individual protein where the amino acid sequence and disorder prediction are displayed along with statistics for the individual protein. As a novel feature, CSpritz provides information about structural homologues as well as secondary structure and short functional linear motifs in each disordered segment. Benchmarking was performed on the very recent CASP9 data, where CSpritz would have ranked consistently well with a Sw measure of 49.27 and AUC of 0.828. The server, together with help and methods pages including examples, are freely available at URL: http://protein.bio.unipd.it/cspritz/.


intelligent information systems | 2002

Hidden Markov Models for Text Categorization in Multi-Page Documents

Paolo Frasconi; Giovanni Soda; Alessandro Vullo

In the traditional setting, text categorization is formulated as a concept learning problem where each instance is a single isolated document. However, this perspective is not appropriate in the case of many digital libraries that offer as contents scanned and optically read books or magazines. In this paper, we propose a more general formulation of text categorization, allowing documents to be organized as sequences of pages. We introduce a novel hybrid system specifically designed for multi-page text documents. The architecture relies on hidden Markov models whose emissions are bag-of-words resulting from a multinomial word event model, as in the generative portion of the Naive Bayes classifier. The rationale behind our proposal is that taking into account contextual information provided by the whole page sequence can help disambiguation and improves single page classification accuracy. Our results on two datasets of scanned journals from the Making of America collection confirm the importance of using whole page sequences. The empirical evaluation indicates that the error rate (as obtained by running the Naive Bayes classifier on isolated pages) can be significantly reduced if contextual information is incorporated.


acm/ieee joint conference on digital libraries | 2001

Text categorization for multi-page documents: a hybrid naive Bayes HMM approach

Paolo Frasconi; Giovanni Soda; Alessandro Vullo

Text categorization is typically formulated as a concept learning prob lem where each instance is a single isolated document. In this paper we are interested in a more general formulation where documents are organized as page sequences, as naturally occurring in digital libraries of scanned books and magazines. We describe a method for classifying pages of sequential OCR text documents into one of several assigned categories and suggest that taking into account contextual information provided by the whole page sequence can significantly improve classification accuracy. The proposed architecture relies on hidden Markov models whose emissions are bag-of-words according to a multinomial word event model, as in the generative portion of the Naive Bayes classifier. Our results on a collection of scanned journals from the Making of America project confirm the importance of using whole page sequences. Empirical evaluation indicates that the error rate (as obtained by running a plain Naive Bayes classifier on isolated page) can be roughly reduced by half if contextual information is incorporated.


BMC Structural Biology | 2009

Ab initio and template-based prediction of multi-class distance maps by two-dimensional recursive neural networks

Ian Walsh; Davide Baù; Alberto J. M. Martin; Catherine Mooney; Alessandro Vullo; Gianluca Pollastri

BackgroundPrediction of protein structures from their sequences is still one of the open grand challenges of computational biology. Some approaches to protein structure prediction, especially ab initio ones, rely to some extent on the prediction of residue contact maps. Residue contact map predictions have been assessed at the CASP competition for several years now. Although it has been shown that exact contact maps generally yield correct three-dimensional structures, this is true only at a relatively low resolution (3–4 Å from the native structure). Another known weakness of contact maps is that they are generally predicted ab initio, that is not exploiting information about potential homologues of known structure.ResultsWe introduce a new class of distance restraints for protein structures: multi-class distance maps. We show that Cαtrace reconstructions based on 4-class native maps are significantly better than those from residue contact maps. We then build two predictors of 4-class maps based on recursive neural networks: one ab initio, or relying on the sequence and on evolutionary information; one template-based, or in which homology information to known structures is provided as a further input. We show that virtually any level of sequence similarity to structural templates (down to less than 10%) yields more accurate 4-class maps than the ab initio predictor. We show that template-based predictions by recursive neural networks are consistently better than the best template and than a number of combinations of the best available templates. We also extract binary residue contact maps at an 8 Å threshold (as per CASP assessment) from the 4-class predictors and show that the template-based version is also more accurate than the best template and consistently better than the ab initio one, down to very low levels of sequence identity to structural templates. Furthermore, we test both ab-initio and template-based 8 Å predictions on the CASP7 targets using a pre-CASP7 PDB, and find that both predictors are state-of-the-art, with the template-based one far outperforming the best CASP7 systems if templates with sequence identity to the query of 10% or better are available. Although this is not the main focus of this paper we also report on reconstructions of Cαtraces based on both ab initio and template-based 4-class map predictions, showing that the latter are generally more accurate even when homology is dubious.ConclusionAccurate predictions of multi-class maps may provide valuable constraints for improved ab initio and template-based prediction of protein structures, naturally incorporate multiple templates, and yield state-of-the-art binary maps. Predictions of protein structures and 8 Å contact maps based on the multi-class distance map predictors described in this paper are freely available to academic users at the url http://distill.ucd.ie/.

Collaboration


Dive into the Alessandro Vullo's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Pierre Baldi

University of California

View shared research outputs
Top Co-Authors

Avatar

Davide Baù

University College Dublin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge