Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Simone Marini is active.

Publication


Featured researches published by Simone Marini.


Scientific Reports | 2018

MTGO: PPI Network Analysis Via Topological and Functional Module Identification

Danila Vella; Simone Marini; Francesca Vitali; Dario Di Silvestre; Giancarlo Mauri; Riccardo Bellazzi

Protein-protein interaction (PPI) networks are viable tools to understand cell functions, disease machinery, and drug design/repositioning. Interpreting a PPI, however, it is a particularly challenging task because of network complexity. Several algorithms have been proposed for an automatic PPI interpretation, at first by solely considering the network topology, and later by integrating Gene Ontology (GO) terms as node similarity attributes. Here we present MTGO - Module detection via Topological information and GO knowledge, a novel functional module identification approach. MTGO let emerge the bimolecular machinery underpinning PPI networks by leveraging on both biological knowledge and topological properties. In particular, it directly exploits GO terms during the module assembling process, and labels each module with its best fit GO term, easing its functional interpretation. MTGO shows largely better results than other state of the art algorithms (including recent GO-based ones) when searching for small or sparse functional modules, while providing comparable or better results all other cases. MTGO correctly identifies molecular complexes and literature-consistent processes in an experimentally derived PPI network of Myocardial infarction. A software version of MTGO is available freely for non-commercial purposes at https://gitlab.com/d1vella/MTGO.


Plant Methods | 2016

Noisy beets: impact of phenotyping errors on genomic predictions for binary traits in Beta vulgaris.

Filippo Biscarini; Nelson Nazzicari; Chiara Broccanello; Piergiorgio Stevanato; Simone Marini

BackgroundNoise (errors) in scientific data is endemic and may have a detrimental effect on statistical analyses and experimental results. The effects of noisy data have been assessed in genome-wide association studies for case-control experiments in human medicine. Little is known, however, on the impact of noisy data on genomic predictions, a widely used statistical application in plant and animal breeding.ResultsIn this study, the sensitivity to noise in the data of five classification methods (K-nearest neighbours—KNN, random forest—RF, ridge logistic regression—LR, and support vector machines with linear or radial basis function kernels) was investigated. A sugar beet population of 123 plants phenotyped for a binary trait and genotyped for 192 SNP (single nucleotide polymorphism) markers was used. Labels (0/1 phenotype) were randomly sampled to generate noise. From the base scenario without errors in the labels, increasing proportions of noisy labels—up to 50xa0%—were generated and introduced in the data.ConclusionsLocal classification methods—KNN and RF—showed higher tolerance to noisy labels compared to methods that leverage global data properties—LR and the two SVM models. In particular, KNN outperformed all other classifiers with AUC (area under the ROC curve) higher than 0.95 up to 20xa0% noisy labels. The runner-up method, RF, had an AUC of 0.941 with 20xa0% noise.


JAMIA Open | 2018

Patient similarity by joint matrix trifactorization to identify subgroups in acute myeloid leukemia

Francesca Vitali; Simone Marini; D Pala; Andrea Demartini; S Montoli; A Zambelli; Riccardo Bellazzi

Abstract Objective Computing patients’ similarity is of great interest in precision oncology since it supports clustering and subgroup identification, eventually leading to tailored therapies. The availability of large amounts of biomedical data, characterized by large feature sets and sparse content, motivates the development of new methods to compute patient similarities able to fuse heterogeneous data sources with the available knowledge. Materials and Methods In this work, we developed a data integration approach based on matrix trifactorization to compute patient similarities by integrating several sources of data and knowledge. We assess the accuracy of the proposed method: (1) on several synthetic data sets which similarity structures are affected by increasing levels of noise and data sparsity, and (2) on a real data set coming from an acute myeloid leukemia (AML) study. The results obtained are finally compared with the ones of traditional similarity calculation methods. Results In the analysis of the synthetic data set, where the ground truth is known, we measured the capability of reconstructing the correct clusters, while in the AML study we evaluated the Kaplan-Meier curves obtained with the different clusters and measured their statistical difference by means of the log-rank test. In presence of noise and sparse data, our data integration method outperform other techniques, both in the synthetic and in the AML data. Discussion In case of multiple heterogeneous data sources, a matrix trifactorization technique can successfully fuse all the information in a joint model. We demonstrated how this approach can be efficiently applied to discover meaningful patient similarities and therefore may be considered a reliable data driven strategy for the definition of new research hypothesis for precision oncology. Conclusion The better performance of the proposed approach presents an advantage over previous methods to provide accurate patient similarities supporting precision medicine.


Developmental Cell | 2018

A Comprehensive Roadmap of Murine Spermatogenesis Defined by Single-Cell RNA-Seq

Christopher Daniel Green; Qianyi Ma; Gabriel Manske; Adrienne Niederriter Shami; Xianing Zheng; Simone Marini; Lindsay Moritz; Caleb Sultan; Stephen J. Gurczynski; Bethany B. Moore; Michelle D. Tallquist; Jun Li; Saher Sue Hammoud

Spermatogenesis requires intricate interactions between the germline and somatic cells. Within a given cross section of a seminiferous tubule, multiple germxa0and somatic cell types co-occur. This cellular heterogeneity has made it difficult to profile distinct cell types at different stages of development. To address this challenge, we collected single-cell RNA sequencing data from ∼35,000 cells from the adult mouse testis and identified all known germ and somatic cells, as well as two unexpected somatic cell types. Our analysis revealed a continuous developmental trajectory of germ cells from spermatogonia to spermatids and identified candidate transcriptional regulators at several transition points during differentiation. Focused analyses delineated four subtypes of spermatogonia and nine subtypes of Sertoli cells; the latter linked to histologically defined developmental stages over the seminiferous epithelial cycle. Overall, this high-resolution cellular atlas represents a community resource and foundation of knowledge to study germ cell development and inxa0vivo gametogenesis.


PLOS ONE | 2016

A Data Fusion Approach to Enhance Association Study in Epilepsy

Simone Marini; Ivan Limongelli; Ettore Rizzo; Alberto Malovini; Edoardo Errichiello; Annalisa Vetro; Tan Da; Orsetta Zuffardi; Riccardo Bellazzi

Among the scientific challenges posed by complex diseases with a strong genetic component, two stand out. One is unveiling the role of rare and common genetic variants; the other is the design of classification models to improve clinical diagnosis and predictive models for prognosis and personalized therapies. In this paper, we present a data fusion framework merging gene, domain, pathway and protein-protein interaction data related to a next generation sequencing epilepsy gene panel. Our method allows integrating association information from multiple genomic sources and aims at highlighting the set of common and rare variants that are capable to trigger the occurrence of a complex disease. When compared to other approaches, our method shows better performances in classifying patients affected by epilepsy.


BMC Evolutionary Biology | 2014

Improvement of Dscam homophilic binding affinity throughout Drosophila evolution.

Guang-Zhong Wang; Simone Marini; Xinyun Ma; Qiang Yang; Xuegong Zhang; Yan Zhu

BackgroundDrosophila Dscam1 is a cell-surface protein that plays important roles in neural development and axon tiling of neurons. It is known that thousands of isoforms bind themselves through specific homophilic interactions, a process which provides the basis for cellular self-recognition. Detailed biochemical studies of specific isoforms strongly suggest that homophilic binding, i.e. the formation of homodimers by identical Dscam1 isomers, is of great importance for the self-avoidance of neurons. Due to experimental limitations, it is currently impossible to measure the homophilic binding affinities for all 19,000 potential isoforms.ResultsHere we reconstructed the DNA sequences of an ancestral Dscam form (which likely existed approximately 40u2009~u200950 million years ago) using a comparative genomic approach. On the basis of this sequence, we established a working model to predict the self-binding affinities of all isoforms in both the current and the ancestral genome, using machine-learning methods. Detailed computational analysis was performed to compare the self-binding affinities of all isoforms present in these two genomes. Our results revealed that 1) isoforms containing newly derived variable domains exhibit higher self-binding affinities than those with conserved domains, and 2) current isoforms display higher self-binding affinities than their counterparts in the ancient genome. As thousands of Dscam isoforms are needed for the self-avoidance of the neuron, we propose that an increase in self-binding affinity provides the basis for the successful evolution of the arthropod brain.ConclusionsOur data presented here provide an excellent model for future experimental studies of the binding behavior of Dscam isoforms. The results of our analysis indicate that evolution favored the rise of novel variable domains thanks to their higher self-binding affinities, rather than selection merely on the basis of simple expansion of isoform diversity, as that this particular selection process would have established the powerful mechanisms required for neuronal self-avoidance. Thus, we reveal here a new molecular mechanism for the successful evolution of arthropod brains.


Current Protein & Peptide Science | 2011

In Silico Protein-Protein Interaction Prediction with Sequence Alignment and Classifier Stacking

Simone Marini; Qian Xu; Qiang Yang

Protein-Protein Interaction (PPI) prediction is a well known problem in Bioinformatics, for which a large number of techniques have been proposed in the past. However, prediction results have not been sufficiently satisfactory for guiding biologists in web-lab experiments. One reason is that not all useful information, such as pairwise protein interaction information based on sequence alignment, has been integrated together in PPI prediction. Alignment is a basic concept to measure sequence similarity in Proteomics that has been used in a number of applications ranging from protein recognition to protein subcellular localization. In this article, we propose a novel integrated approach to predicting PPI based on sequence alignment by jointly using a k-Nearest Neighbor classifier (SA-kNN) and a Support Vector Machine (SVM). SVM is a machine learning technique used in a wide range of Bioinformatics applications, thanks to the ability to alleviate the overfitting problems. We demonstrate that in our approach the two methods, SA-kNN and SVM, are complementary, which are combined in an ensemble to overcome their respective limitations. While the SVM is trained on Amino Acid (AA) compositions and protein signatures mined from literature, the SA-kNN makes use of the similarity of two protein pairs through alignment. Experimentally, our technique leads to a significant gain in accuracy, precision and sensitivity measures at ~5%, 16% and 10% respectively.


Journal of Biomedical Informatics | 2018

Patient similarity for precision medicine: A systematic review

Enea Parimbelli; Simone Marini; Lucia Sacchi; Riccardo Bellazzi

Evidence-based medicine is the most prevalent paradigm adopted by physicians. Clinical practice guidelines typically define a set of recommendations together with eligibility criteria that restrict their applicability to a specific group of patients. The ever-growing size and availability of health-related data is currently challenging the broad definitions of guideline-defined patient groups. Precision medicine leverages on genetic, phenotypic, or psychosocial characteristics to provide precise identification of patient subsets for treatment targeting. Defining a patient similarity measure is thus an essential step to allow stratification of patients into clinically-meaningful subgroups. The present review investigates the use of patient similarity as a tool to enable precision medicine. 279 articles were analyzed along four dimensions: data types considered, clinical domains of application, data analysis methods, and translational stage of findings. Cancer-related research employing molecular profiling and standard data analysis techniques such as clustering constitute the majority of the retrieved studies. Chronic and psychiatric diseases follow as the second most represented clinical domains. Interestingly, almost one quarter of the studies analyzed presented a novel methodology, with the most advanced employing data integration strategies and being portable to different clinical domains. Integration of such techniques into decision support systems constitutes and interesting trend for future research.


Briefings in Bioinformatics | 2018

Toward more accurate prediction of caspase cleavage sites: a comprehensive review of current methods, tools and features

Yu Bao; Simone Marini; Takeyuki Tamura; Mayumi Kamada; Shingo Maegawa; Hiroshi Hosokawa; Jiangning Song; Tatsuya Akutsu

Abstract As one of the few irreversible protein posttranslational modifications, proteolytic cleavage is involved in nearly all aspects of cellular activities, ranging from gene regulation to cell life-cycle regulation. Among the various protease-specific types of proteolytic cleavage, cleavages by casapses/granzyme B are considered as essential in the initiation and execution of programmed cell death and inflammation processes. Although a number of substrates for both types of proteolytic cleavage have been experimentally identified, the complete repertoire of caspases and granzyme B substrates remains to be fully characterized. To tackle this issue and complement experimental efforts for substrate identification, systematic bioinformatics studies of known cleavage sites provide important insights into caspase/granzyme B substrate specificity, and facilitate the discovery of novel substrates. In this article, we review and benchmark 12 state-of-the-art sequence-based bioinformatics approaches and tools for caspases/granzyme B cleavage prediction. We evaluate and compare these methods in terms of their input/output, algorithms used, prediction performance, validation methods and software availability and utility. In addition, we construct independent data sets consisting of caspases/granzyme B substrates from different species and accordingly assess the predictive power of these different predictors for the identification of cleavage sites. We find that the prediction results are highly variable among different predictors. Furthermore, we experimentally validate the predictions of a case study by performing caspase cleavage assay. We anticipate that this comprehensive review and survey analysis will provide an insightful resource for biologists and bioinformaticians who are interested in using and/or developing tools for caspase/granzyme B cleavage prediction.


Bioinformatics | 2018

Protease target prediction via matrix factorization

Simone Marini; Francesca Vitali; Sara Rampazzi; Andrea Demartini; Tatsuya Akutsu

Motivation Protein cleavage is an important cellular event, involved in a myriad of processes, from apoptosis to immune response. Bioinformatics provides in silico tools, such as machine learning‐based models, to guide the discovery of targets for the proteases responsible for protein cleavage. State‐of‐the‐art models have a scope limited to specific protease families (such as Caspases), and do not explicitly include biological or medical knowledge (such as the hierarchical protein domain similarity or gene‐gene interactions). To fill this gap, we present a novel approach for protease target prediction based on data integration. Results By representing protease‐protein target information in the form of relational matrices, we design a model (i) that is general and not limited to a single protease family, and (b) leverages on the available knowledge, managing extremely sparse data from heterogeneous data sources, including primary sequence, pathways, domains and interactions. When compared with other algorithms on test data, our approach provides a better performance even for models specifically focusing on a single protease family. Availability and implementation https://gitlab.com/smarini/MaDDA/ (Matlab code and utilized data.) Supplementary information Supplementary data are available at Bioinformatics online.

Collaboration


Dive into the Simone Marini's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Qiang Yang

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge