Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Davide Ballabio is active.

Publication


Featured researches published by Davide Ballabio.


Journal of Chemical Information and Modeling | 2009

Comments on the Definition of the Q2 Parameter for QSAR Validation

Viviana Consonni; Davide Ballabio; Roberto Todeschini

This paper deals with the problem of evaluating the predictive ability of QSAR models and continues the discussion about proper estimates of the predictive ability from an external evaluation set reported in Schüürmann G., Ebert R.-U., et al. External Validation and Prediction Employing the Predictive Squared Correlation Coefficient--Test Set Activity Mean vs Training Set Activity Mean. J. Chem. Inf. Model. 2008, 48, 2140-2145 . The two formulas for calculating the predictive squared correlation coefficient Q2 previously discussed by Schüürmann et al. are one that adopted by the current OECD guidelines about QSAR validation and based on SS (sum of squares) of the external test set referring to the training set response mean and the other based on SS of the external test set referring to the test set response mean. In addition to these two formulas, another formula is evaluated here, based on SS referring to mean deviations of observed values from the training set mean over the training set instead of the external evaluation set.


Molecules | 2012

Comparison of different approaches to define the applicability domain of QSAR models.

Faizan Sahigara; Kamel Mansouri; Davide Ballabio; A. Mauri; Consonni; Roberto Todeschini

One of the OECD principles for model validation requires defining the Applicability Domain (AD) for the QSAR models. This is important since the reliable predictions are generally limited to query chemicals structurally similar to the training compounds used to build the model. Therefore, characterization of interpolation space is significant in defining the AD and in this study some existing descriptor-based approaches performing this task are discussed and compared by implementing them on existing validated datasets from the literature. Algorithms adopted by different approaches allow defining the interpolation space in several ways, while defined thresholds contribute significantly to the extrapolations. For each dataset and approach implemented for this study, the comparison analysis was carried out by considering the model statistics and relative position of test set with respect to the training space.


Journal of Chemical Information and Modeling | 2013

Quantitative Structure–Activity Relationship Models for Ready Biodegradability of Chemicals

Kamel Mansouri; Tine Ringsted; Davide Ballabio; Roberto Todeschini; Viviana Consonni

The European REACH regulation requires information on ready biodegradation, which is a screening test to assess the biodegradability of chemicals. At the same time REACH encourages the use of alternatives to animal testing which includes predictions from quantitative structure-activity relationship (QSAR) models. The aim of this study was to build QSAR models to predict ready biodegradation of chemicals by using different modeling methods and types of molecular descriptors. Particular attention was given to data screening and validation procedures in order to build predictive models. Experimental values of 1055 chemicals were collected from the webpage of the National Institute of Technology and Evaluation of Japan (NITE): 837 and 218 molecules were used for calibration and testing purposes, respectively. In addition, models were further evaluated using an external validation set consisting of 670 molecules. Classification models were produced in order to discriminate biodegradable and nonbiodegradable chemicals by means of different mathematical methods: k nearest neighbors, partial least squares discriminant analysis, and support vector machines, as well as their consensus models. The proposed models and the derived consensus analysis demonstrated good classification performances with respect to already published QSAR models on biodegradation. Relationships between the molecular descriptors selected in each QSAR model and biodegradability were evaluated.


Analytica Chimica Acta | 2011

Monitoring of alcoholic fermentation using near infrared and mid infrared spectroscopies combined with electronic nose and electronic tongue

Susanna Buratti; Davide Ballabio; G. Giovanelli; C.M. Zuluanga Dominguez; A. Moles; Simona Benedetti; Nicoletta Sinelli

Effective fermentation monitoring is a growing need due to the rapid pace of change in the wine industry, which calls for fast methods providing real time information in order to assure the quality of the final product. The objective of this work is to investigate the potential of non-destructive techniques associated with chemometric data analysis, to monitor time-related changes that occur during red wine fermentation. Eight micro-fermentation trials conducted in the Valtellina region (Northern Italy) during the 2009 vintage, were monitored by a FT-NIR and a FT-IR spectrometer and by an electronic nose and tongue. The spectroscopic technique was used to investigate molecular changes, while electronic nose and electronic tongue evaluated the evolution of the aroma and taste profile during the must-wine fermentation. Must-wine samples were also analysed by traditional chemical methods in order to determine sugars (glucose and fructose) consumption and alcohol (ethanol and glycerol) production. Principal Component Analysis was applied to spectral, electronic nose and electronic tongue data, as an exploratory tool, to uncover molecular, aroma and taste modifications during the fermentation process. Furthermore, the chemical data and the PC1 scores from spectral, electronic nose and electronic tongue data were modelled as a function of time to identify critical points during fermentation. The results showed that NIR and MIR spectroscopies are useful to investigate molecular changes involved in wine fermentation while electronic nose and electronic tongue can be applied to detect the evolution of taste and aroma profile. Moreover, as demonstrated through the modeling of NIR, MIR, electronic nose and electronic tongue data, these non destructive methods are suitable for the monitoring of must-wine fermentation giving crucial information about the quality of the final product in agreement with chemical parameters. Although in this study the measurements were carried out in off-line mode, in future these non destructive techniques could be valid and simple tools, able to provide in-time information about the fermentation process and to assure the quality of wine.


Environmental Pollution | 2013

Particle size, chemical composition, seasons of the year and urban, rural or remote site origins as determinants of biological effects of particulate matter on pulmonary cells

Maria Grazia Perrone; Maurizio Gualtieri; Viviana Consonni; L. Ferrero; G Sangiorgi; Eleonora Longhin; Davide Ballabio; Ezio Bolzacchini; Marina Camatini

Particulate matter (PM), a complex mix of chemical compounds, results to be associated with various health effects. However there is still lack of information on the impact of its different components. PM2.5 and PM1 samples, collected during the different seasons at an urban, rural and remote site, were chemically characterized and the biological effects induced on A549 cells were assessed. A Partial Least Square Discriminant Analysis has been performed to relate PM chemical composition to the toxic effects observed. Results show that PM-induced biological effects changed with the seasons and sites, and such variations may be explained by chemical constituents of PM, derived both from primary and secondary sources. The first-time here reported biological responses induced by PM from a remote site at high altitude were associated with the high concentrations of metals and secondary species typical of the free tropospheric aerosol, influenced by long range transports and aging.


Infrared Spectroscopy for Food Quality Analysis and Control | 2009

Multivariate Classification for Qualitative Analysis

Davide Ballabio; Roberto Todeschini

Classification methods are fundamental chemometric techniques designed to find mathematical models able to recognize the membership of each object to its proper class on the basis of a set of measurements. Classification techniques can be probabilistic, if they are based on estimates of probability distributions. Among probabilistic techniques, parametric and nonparametric methods can be distinguished, when probability distributions are characterized by location, and dispersion parameters such as mean, variance, and covariance. Classification methods can also be defined as distance-based, if they require the calculation of distances between objects or between objects, and models. Several parameters can be used for the quality estimation of classification models, both for fitting and validation purposes. These parameters are related to the presence of errors in the results, even if errors can be considered with different weights on the basis of the classification aims. One of the simplest classification methods is nearest mean classifier (NMC) that is a parametric, unbiased, and probabilistic method. Among traditional classifiers, discriminant analysis is probably the most known method and can be considered the first multivariate classification technique. Artificial neural networks (ANNs) are increasing in uses related to several chemical applications and nowadays can be considered as one of the most important emerging tools in chemometrics.


Journal of Cheminformatics | 2013

Defining a novel k-nearest neighbours approach to assess the applicability domain of a QSAR model for reliable predictions

Faizan Sahigara; Davide Ballabio; Roberto Todeschini; Viviana Consonni

BackgroundWith the growing popularity of using QSAR predictions towards regulatory purposes, such predictive models are now required to be strictly validated, an essential feature of which is to have the model’s Applicability Domain (AD) defined clearly. Although in recent years several different approaches have been proposed to address this goal, no optimal approach to define the model’s AD has yet been recognized.ResultsThis study proposes a novel descriptor-based AD method which accounts for the data distribution and exploits k-Nearest Neighbours (kNN) principle to derive a heuristic decision rule. The proposed method is a three-stage procedure to address several key aspects relevant in judging the reliability of QSAR predictions. Inspired from the adaptive kernel method for probability density function estimation, the first stage of the approach defines a pattern of thresholds corresponding to the various training samples and these thresholds are later used to derive the decision rule. Criterion deciding if a given test sample will be retained within the AD is defined in the second stage of the approach. Finally, the last stage tries reflecting upon the reliability in derived results taking model statistics and prediction error into account.ConclusionsThe proposed approach addressed a novel strategy that integrated the kNN principle to define the AD of QSAR models. Relevant features that characterize the proposed AD approach include: a) adaptability to local density of samples, useful when the underlying multivariate distribution is asymmetric, with wide regions of low data density; b) unlike several kernel density estimators (KDE), effectiveness also in high-dimensional spaces; c) low sensitivity to the smoothing parameter k; and d) versatility to implement various distances measures. The results derived on a case study provided a clear understanding of how the approach works and defines the model’s AD for reliable predictions.


Analytica Chimica Acta | 2013

Locally centred Mahalanobis distance: A new distance measure with salient features towards outlier detection

Roberto Todeschini; Davide Ballabio; Viviana Consonni; Faizan Sahigara; Peter Filzmoser

Outlier detection is a prerequisite to identify the presence of aberrant samples in a given set of data. The identification of such diverse data samples is significant particularly for multivariate data analysis where increasing data dimensionality can easily hinder the data exploration and such outliers often go undetected. This paper is aimed to introduce a novel Mahalanobis distance measure (namely, a pseudo-distance) termed as locally centred Mahalanobis distance, derived by centering the covariance matrix at each data sample rather than at the data centroid as in the classical covariance matrix. Two parameters, called as Remoteness and Isolation degree, were derived from the resulting pairwise distance matrix and their salient features facilitated a better identification of atypical samples isolated from the rest of the data, thus reflecting their potential application towards outlier detection. The Isolation degree demonstrated to be able to detect a new kind of outliers, that is, isolated samples within the data domain, thus resulting in a useful diagnostic tool to evaluate the reliability of predictions obtained by local models (e.g. k-NN models). To better understand the role of Remoteness and Isolation degree in identification of such aberrant data samples, some simulated and published data sets from literature were considered as case studies and the results were compared with those obtained by using Euclidean distance and classical Mahalanobis distance.


Analytica Chimica Acta | 2008

Multiblock variance partitioning: A new approach for comparing variation in multiple data blocks

Thomas Skov; Davide Ballabio; Rasmus Bro

More than one multi-informative analytical technique is often applied when describing the condition of a set of samples. Often a part of the information found in these data blocks is redundant and can be extracted from more blocks. This study puts forward a method (multiblock variance partitioning-MVP) to compare the information/variation in different data blocks using simple quantitative measures. These measures are the unique part of the variation only found in one data block and the common part that can be found in more data blocks. These different parts are found using PLS models between predictor blocks and a common response. MVP provides a different view on the information in different blocks than normal multiblock analysis. It will be shown that this has many applications in very diverse fields such as process control, assessor performance in sensory analysis, efficiency of preprocessing methods and as complementary information to an interval PLS analysis. Here the ideas of the MVP approach are presented in detail using a study of red wines from different regions measured with GC-MS and FT-IR instruments providing different kinds of data representations.


Environmental Pollution | 2010

Development of models for predicting toxicity from sediment chemistry by partial least squares-discriminant analysis and counter-propagation artificial neural networks.

Manuel Alvarez-Guerra; Davide Ballabio; José Manuel Amigo; Rasmus Bro; Javier R. Viguri

There is strong interest in developing tools to link chemical concentrations of contaminants to the potential for observing sediment toxicity that can be used in initial screening-level sediment quality assessments. This paper presents new approaches for predicting toxicity in sediments, based on 10-day survival tests with marine amphipods, from sediment chemistry, by means of the application of Partial Least Squares-Discriminant Analysis (PLS-DA) and Counter-propagation Artificial Neural Networks (CP-ANNs) to large historical databases of chemical and toxicity data. The exploration of the internal structure of the developed models revealed inherent limitations of predicting toxicity from common chemical analyses of bulk contaminant concentrations. However, the results obtained in the validation of these models combined relevant values of non-error classification rate, sensitivity and specificity of, respectively, 76, 87 and 73% with PLS-DA and 92, 75 and 97% with CP-ANNs, outperforming the results reported for previous approaches.

Collaboration


Dive into the Davide Ballabio's collaboration.

Top Co-Authors

Avatar

Roberto Todeschini

University of Milano-Bicocca

View shared research outputs
Top Co-Authors

Avatar

Viviana Consonni

University of Milano-Bicocca

View shared research outputs
Top Co-Authors

Avatar

A. Mauri

University of Zurich

View shared research outputs
Top Co-Authors

Avatar

Consonni

University of Milano-Bicocca

View shared research outputs
Top Co-Authors

Avatar

Francesca Grisoni

University of Milano-Bicocca

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

A. Manganaro

University of Milano-Bicocca

View shared research outputs
Top Co-Authors

Avatar

Matteo Cassotti

University of Milano-Bicocca

View shared research outputs
Top Co-Authors

Avatar

Faizan Sahigara

University of Milano-Bicocca

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge