Francesca Martella
Sapienza University of Rome
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Francesca Martella.
Aging Cell | 2013
Marian Beekman; Hélène Blanché; Markus Perola; Anti Hervonen; Vladyslav Bezrukov; Ewa Sikora; Friederike Flachsbart; Lene Christiansen; Anton J. M. de Craen; Thomas B. L. Kirkwood; Irene Maeve Rea; Michel Poulain; Jean-Marie Robine; Silvana Valensin; Maria Antonietta Stazi; Giuseppe Passarino; Luca Deiana; Efstathios S. Gonos; Lavinia Paternoster; Thorkild Ingvor Arrild Sørensen; Qihua Tan; Quinta Helmer; Erik B. van den Akker; Joris Deelen; Francesca Martella; Heather J. Cordell; Kristin L. Ayers; James W. Vaupel; Outi Törnwall; Thomas E. Johnson
Clear evidence exists for heritability of human longevity, and much interest is focused on identifying genes associated with longer lives. To identify such longevity alleles, we performed the largest genome‐wide linkage scan thus far reported. Linkage analyses included 2118 nonagenarian Caucasian sibling pairs that have been enrolled in 15 study centers of 11 European countries as part of the Genetics of Healthy Aging (GEHA) project. In the joint linkage analyses, we observed four regions that show linkage with longevity; chromosome 14q11.2 (LOD = 3.47), chromosome 17q12‐q22 (LOD = 2.95), chromosome 19p13.3‐p13.11 (LOD = 3.76), and chromosome 19q13.11‐q13.32 (LOD = 3.57). To fine map these regions linked to longevity, we performed association analysis using GWAS data in a subgroup of 1228 unrelated nonagenarian and 1907 geographically matched controls. Using a fixed‐effect meta‐analysis approach, rs4420638 at the TOMM40/APOE/APOC1 gene locus showed significant association with longevity (P‐value = 9.6 × 10−8). By combined modeling of linkage and association, we showed that association of longevity with APOEε4 and APOEε2 alleles explain the linkage at 19q13.11‐q13.32 with P‐value = 0.02 and P‐value = 1.0 × 10−5, respectively. In the largest linkage scan thus far performed for human familial longevity, we confirm that the APOE locus is a longevity gene and that additional longevity loci may be identified at 14q11.2, 17q12‐q22, and 19p13.3‐p13.11. As the latter linkage results are not explained by common variants, we suggest that rare variants play an important role in human familial longevity.
The International Journal of Biostatistics | 2008
Francesca Martella; Marco Alfò; Maurizio Vichi
A challenge in microarray data analysis concerns discovering local structures composed by sets of genes that show homogeneous expression patterns across subsets of conditions. We present an extension of the mixture of factor analyzers model (MFA) allowing for simultaneous clustering of genes and conditions. The proposed model is rather flexible since it models the density of high-dimensional data assuming a mixture of Gaussian distributions with a particular omponent-specific covariance structure. Specifically, a binary and row stochastic matrix representing tissue membership is used to cluster tissues (experimental conditions), whereas the traditional mixture approach is used to define the gene clustering. An alternating expectation conditional maximization (AECM) algorithm is proposed for parameter estimation; experiments on simulated and real data show the efficiency of our method as a general approach to biclustering. The Matlab code of the algorithm is available upon request from authors.
Folia Geobotanica | 2014
Fabio Attorre; F. Francesconi; Michele De Sanctis; Marco Alfò; Francesca Martella; Roberto Valenti; Marcello Vitale
The present paper presents the application of a finite mixture model (FMM) to analyze spatially explicit data on forest composition and environmental variables to produce a high-resolution map of their current potential distribution. FMM provides a convenient yet formal setting for model-based clustering. Within this framework, forest data are assumed to come from an underlying FMM, where each mixture component corresponds to a cluster and each cluster is characterized by a different composition of tree species. An important extension of this model is based on including a set of covariates to predict class membership. These covariates can be climatic and topographical parameters as well as geographical coordinates and the class membership of neighbouring plots. FMM was applied to a national forest inventory of Italy consisting of 6,714 plots with a measure of abundance for 27 tree species. In this way, a map of potential forest types was produced. The limitations and usefulness of the proposed modelling approach were analyzed and discussed, comparing the results with an independently derived expert map.
Journal of Applied Statistics | 2012
Francesca Martella; Maurizio Vichi
The microarray technology allows the measurement of expression levels of thousands of genes simultaneously. The dimension and complexity of gene expression data obtained by microarrays create challenging data analysis and management problems ranging from the analysis of images produced by microarray experiments to biological interpretation of results. Therefore, statistical and computational approaches are beginning to assume a substantial position within the molecular biology area. We consider the problem of simultaneously clustering genes and tissue samples (in general conditions) of a microarray data set. This can be useful for revealing groups of genes involved in the same molecular process as well as groups of conditions where this process takes place. The need of finding a subset of genes and tissue samples defining a homogeneous block had led to the application of double clustering techniques on gene expression data. Here, we focus on an extension of standard K-means to simultaneously cluster observations and features of a data matrix, namely double K-means introduced by Vichi (2000). We introduce this model in a probabilistic framework and discuss the advantages of using this approach. We also develop a coordinate ascent algorithm and test its performance via simulation studies and real data set. Finally, we validate the results obtained on the real data set by building resampling confidence intervals for block centroids.
Statistical Modelling | 2011
Francesca Martella; Marco Alfò; Maurizio Vichi
In the last few years, model-based clustering techniques have become widely used in the context of microarray data analysis. In this empirical context, a potential purpose for statistical approaches is the identification of clusters of genes that are co-expressed under subsets of experimental conditions. We discuss a hierarchical mixture model to combine advantages of allowing for dependence within gene clusters and for simultaneous clustering of genes and experimental conditions. Thanks to the adopted hierarchical structure, we may distinguish gene clusters from mixture components, where the latter may represent intra-cluster gene-specific extra-Gaussian departures. To cluster experimental conditions, instead, we suggest a suitable parameterization of component-specific means by using a binary row stochastic matrix representing condition membership. The performance of the proposed approach is discussed on both simulated and real datasets.
The Annals of Applied Statistics | 2017
Antonello Maruotti; Jan Bulla; Francesco Lagona; Marco Picone; Francesca Martella
The assessment of pollution exposure is based on the analysis of a multivariate time series that include the concentrations of several pollutants as well as the measurements of multiple atmospheric variables. It typically requires methods of dimensionality reduction that are capable of identifying potentially dangerous combinations of pollutants and simultaneously segmenting exposure periods according to air quality conditions. When the data are high-dimensional, however, efficient methods of dimensionality reduction are challenging because of the formidable structure of cross-correlations that arise from the dynamic interaction between weather conditions and natural/anthropogenic pollution sources. In order to assess pollution exposure in an urban area while taking the above mentioned difficulties into account, we have developed a class of parsimonious hidden Markov models. In a multivariate time series setting, this approach simultaneously allows for the performance of temporal segmentation and dimensionality reduction. We specifically approximate the distribution of multiple pollutant concentrations by mixtures of factor analysis models, whose parameters evolve according to a latent Markov chain. Covariates are included as predictors of the chain transition probabilities. Parameter constraints on the factorial component of the model are exploited to tune the flexibility of dimensionality reduction. In order to estimate the model parameters efficiently, we have proposed a novel three-step Alternating Expected Conditional Maximization (AECM) algorithm, which is also assessed in a simulation study. In the case study, the proposed methods could (1) describe the exposure to pollution in terms of a few latent regimes, (2) associate these regimes with specific combinations of pollutant concentration levels as well as distinct correlation structures between concentrations, and (3) capture the influence of weather conditions on transitions between regimes. Paper scaricabile da: https://projecteuclid.org/euclid.aoas/1507168842 Mercoledì 31 Gennaio 2018, ore 11:30 Aula 1 Palazzo delle Scienze, Corso Italia 55, Catania
Journal of Applied Statistics | 2015
Valentina Raponi; Francesca Martella; Antonello Maruotti
University evaluation is a topic of increasing concern in Italy as well as in other countries. In empirical analysis, university activities and performances are often measured by means of indicator variables. The available information are then summarized to respond to different aims. We argue that the evaluation process is a complex phenomenon that cannot be addressed by a simple descriptive approach. In this paper, we used a model-based approach to account for association between indicators and similarities among the observed universities. We examine faculty-level data collected from different sources, covering 55 Italian Economics faculties in the academic year 2009/2010. Making use of a clustering methodology, we introduce a biclustering model that accounts for both homogeneity/heterogeneity among faculties and correlations between indicators. Our results show that there are two substantial different performances between universities which can be strictly related to the nature of the institutions, namely the Private and Public profiles. Each of the two groups has its own peculiar features and its own group-specific list of priorities, strengths and weaknesses. Thus, we suggest that caution should be used in interpreting standard university rankings as they generally do not account for the complex structure of the data.
BMC Bioinformatics | 2015
Michele Pelosi; Marco Alfò; Francesca Martella; Elisa Pappalardo; Antonio Musarò
BackgroundThis study addresses a recurrent biological problem, that is to define a formal clustering structure for a set of tissues on the basis of the relative abundance of multiple alternatively spliced isoforms mRNAs generated by the same gene. To this aim, we have used a model-based clustering approach, based on a finite mixture of multivariate Gaussian densities. However, given we had more technical replicates from the same tissue for each quantitative measurement, we also employed a finite mixture of linear mixed models, with tissue-specific random effects.ResultsA panel of human tissues was analysed through quantitative real-time PCR methods, to quantify the relative amount of mRNA encoding different IGF-1 alternative splicing variants. After an appropriate, preliminary, equalization of the quantitative data, we provided an estimate of the distribution of the observed concentrations for the different IGF-1 mRNA splice variants in the cohort of tissues by employing suitable kernel density estimators. We observed that the analysed IGF-1 mRNA splice variants were characterized by multimodal distributions, which could be interpreted as describing the presence of several sub-population, i.e. potential tissue clusters. In this context, a formal clustering approach based on a finite mixture model (FMM) with Gaussian components is proposed. Due to the presence of potential dependence between the technical replicates (originated by repeated quantitative measurements of the same mRNA splice isoform in the same tissue) we have also employed the finite mixture of linear mixed models (FMLMM), which allowed to take into account this kind of within-tissue dependence.ConclusionsThe FMM and the FMLMM provided a convenient yet formal setting for a model-based clustering of the human tissues in sub-populations, characterized by homogeneous values of concentrations of the mRNAs for one or multiple IGF-1 alternative splicing isoforms.The proposed approaches can be applied to any cohort of tissues expressing several alternatively spliced mRNAs generated by the same gene, and can overcome the limitations of clustering methods based on simple comparisons between splice isoform expression levels.
Statistics in Medicine | 2011
Francesca Martella; Jeroen K. Vermunt; Marian Beekman; Rudi G. J. Westendorp; P.E. Slagboom; Jeanine J. Houwing-Duistermaat
In healthy aging research, typically multiple health outcomes are measured, representing health status. The aim of this paper was to develop a model-based clustering approach to identify homogeneous sibling pairs according to their health status. Model-based clustering approaches will be considered on the basis of linear mixed effect model for the mixture components. Class memberships of siblings within pairs are allowed to be correlated, and within a class the correlation between siblings is modeled using random sibling pair effects. We propose an expectation-maximization algorithm for maximum likelihood estimation. Model performance is evaluated via simulations in terms of estimating the correct parameters, degree of agreement, and the ability to detect the correct number of clusters. The performance of our model is compared with the performance of standard model-based clustering approaches. The methods are used to classify sibling pairs from the Leiden Longevity Study according to their health status. Our results suggest that homogeneous healthy sibling pairs are associated with a longer life span. Software is available for fitting the new models.
Statistics and Computing | 2015
Francesca Martella; Donatella Vicari; Maurizio Vichi
A Multivariate Regression Model Based on the Optimal Partition of Predictors (MRBOP) useful in applications in the presence of strongly correlated predictors is presented. Such classes of predictors are synthesized by latent factors, which are obtained through an appropriate linear combination of the original variables and are forced to be weakly correlated. Specifically, the proposed model assumes that the latent factors are determined by subsets of predictors characterizing only one latent factor. MRBOP is formalized in a least squares framework optimizing a penalized quadratic objective function through an alternating least-squares (ALS) algorithm. The performance of the methodology is evaluated on simulated and real data sets.