Comput. Phys. Commun. | 2021

Implementation of a Bayesian secondary structure estimation method for the SESCA circular dichroism analysis package

 
 

Abstract


Abstract Circular dichroism spectroscopy is a structural biology technique frequently applied to determine the secondary structure composition of soluble proteins. Our recently introduced computational analysis package SESCA aids the interpretation of protein circular dichroism spectra and enables the validation of proposed corresponding structural models. To further these aims, we present the implementation and characterization of a new Bayesian secondary structure estimation method in SESCA, termed SESCA_bayes. SESCA_bayes samples possible secondary structures using a Monte Carlo scheme, driven by the likelihood of estimated scaling errors and non-secondary-structure contributions of the measured spectrum. SESCA_bayes provides an estimated secondary structure composition and separate uncertainties on the fraction of residues in each secondary structure class. It also assists efficient model validation by providing a posterior secondary structure probability distribution based on the measured spectrum. Our presented study indicates that SESCA_bayes estimates the secondary structure composition with a significantly smaller uncertainty than its predecessor, SESCA_deconv, which is based on spectrum deconvolution. Further, the mean accuracy of the two methods in our analysis is comparable, but SESCA_bayes provides more accurate estimates for circular dichroism spectra that contain considerable non-SS contributions. Program summary Program Title: SESCA_bayes CPC Library link to program files: https://doi.org/10.17632/5nnsbn6ync.1 Developer s repository link: https://www.mpibpc.mpg.de/sesca Licensing provisions: GPLv3 Programming language: Python Nature of problem: The circular dichroism spectrum of a protein is strongly correlated with its secondary structure composition. However, determining the secondary structure from a spectrum is hindered by non-secondary structure contributions and by scaling errors due the uncertainty of the protein concentration. If not taken properly into account, these experimental factors can cause considerable errors when conventional secondary-structure estimation methods are used. Because these errors combine with errors of the proposed structural model in a non-additive fashion, it is difficult to assess how much uncertainty the experimental factors introduce to model validation approaches based on circular dichroism spectra. Solution method: For a given measured circular dichroism spectrum, the SESCA_bayes algorithm applies Bayesian statistics to account for scaling errors and non-secondary structure contributions and to determine the conditional secondary structure probability distribution. This approach relies on fast spectrum predictions based on empirical basis spectrum sets and joint probability distribution maps for scaling factors and non-secondary structure distributions. Because SESCA_bayes estimates the most probable secondary structure composition based on a probability-weighted sample distribution, it avoids the typical fitting errors that occur during conventional spectrum deconvolution methods. It also estimates the uncertainty of circular dichroism based model validation more accurately than previous methods of the SESCA analysis package.

Volume 266
Pages 108022
DOI 10.1016/J.CPC.2021.108022
Language English
Journal Comput. Phys. Commun.

Full Text