Featured Researches

Quantitative Methods

Alchemical Transformations for Concerted Hydration Free Energy Estimation with Explicit Solvation

We present a family of alchemical perturbation potentials that enable the calculation of hydration free energies of small to medium-sized molecules in a concerted single alchemical coupling step instead of the commonly used sequence of two distinct coupling steps for Lennard-Jones and electrostatic interactions. The perturbation potentials are based on the softplus function of the solute-solvent interaction energy designed to focus sampling near entropic bottlenecks along the alchemical pathway. We present a general framework to optimize the parameters of alchemical perturbation potentials of this kind. The optimization procedure is based on the λ -function formalism and the maximum-likelihood parameter estimation procedure we developed earlier to avoid the occurrence of multi-modal distributions of the coupling energy along the alchemical path. A novel soft-core function applied to the overall solute-solvent interaction energy rather than individual interatomic pair potentials critical for this result is also presented. Because it does not require modifications of core force and energy routines, the soft-core formulation can be easily deployed in molecular dynamics simulation codes. We illustrate the method by applying it to the estimation of the hydration free energy in water droplets of compounds of varying size and complexity. In each case, we show that convergence of the hydration free energy is achieved rapidly. This work paves the way for the ongoing development of more streamlined algorithms to estimate free energies of molecular binding with explicit solvation.

Read more
Quantitative Methods

Align-gram : Rethinking the Skip-gram Model for Protein Sequence Analysis

Background: The inception of next generations sequencing technologies have exponentially increased the volume of biological sequence data. Protein sequences, being quoted as the `language of life', has been analyzed for a multitude of applications and inferences. Motivation: Owing to the rapid development of deep learning, in recent years there have been a number of breakthroughs in the domain of Natural Language Processing. Since these methods are capable of performing different tasks when trained with a sufficient amount of data, off-the-shelf models are used to perform various biological applications. In this study, we investigated the applicability of the popular Skip-gram model for protein sequence analysis and made an attempt to incorporate some biological insights into it. Results: We propose a novel k -mer embedding scheme, Align-gram, which is capable of mapping the similar k -mers close to each other in a vector space. Furthermore, we experiment with other sequence-based protein representations and observe that the embeddings derived from Align-gram aids modeling and training deep learning models better. Our experiments with a simple baseline LSTM model and a much complex CNN model of DeepGoPlus shows the potential of Align-gram in performing different types of deep learning applications for protein sequence analysis.

Read more
Quantitative Methods

Aligning biological sequences by exploiting residue conservation and coevolution

Sequences of nucleotides (for DNA and RNA) or amino acids (for proteins) are central objects in biology. Among the most important computational problems is that of sequence alignment, i.e. arranging sequences from different organisms in such a way to identify similar regions, to detect evolutionary relationships between sequences, and to predict biomolecular structure and function. This is typically addressed through profile models, which capture position-specificities like conservation in sequences, but assume an independent evolution of different positions. Over the last years, it has been well established that coevolution of different amino-acid positions is essential for maintaining three-dimensional structure and function. Modeling approaches based on inverse statistical physics can catch the coevolution signal in sequence ensembles; and they are now widely used in predicting protein structure, protein-protein interactions, and mutational landscapes. Here, we present DCAlign, an efficient alignment algorithm based on an approximate message-passing strategy, which is able to overcome the limitations of profile models, to include coevolution among positions in a general way, and to be therefore universally applicable to protein- and RNA-sequence alignment without the need of using complementary structural information. The potential of DCAlign is carefully explored using well-controlled simulated data, as well as real protein and RNA sequences.

Read more
Quantitative Methods

An Ansatz for undecidable computation in RNA-world automata

In this Ansatz we consider theoretical constructions of RNA polymers into automata, a form of computational structure. The basis for transitions in our automata are plausible RNA-world enzymes that may perform ligation or cleavage. Limited to these operations, we construct RNA automata of increasing complexity; from the Finite Automaton (RNA-FA) to the Turing Machine equivalent 2-stack PDA (RNA-2PDA) and the universal RNA-UPDA. For each automaton we show how the enzymatic reactions match the logical operations of the RNA automaton, and describe how biological exploration of the corresponding evolutionary space is facilitated by the efficient arrangement of RNA polymers into a computational structure. A critical theme of the Ansatz is the self-reference in RNA automata configurations which exploits the program-data duality but results in undecidable computation. We describe how undecidable computation is exemplified in the self-referential Liar paradox that places a boundary on a logical system, and by construction, any RNA automata. We argue that an expansion of the evolutionary space for RNA-2PDA automata can be interpreted as a hierarchical resolution of the undecidable computation by a meta-system (akin to Turing's oracle), in a continual process analogous to Turing's ordinal logics and Post's extensible recursively generated logics. On this basis, we put forward the hypothesis that the resolution of undecidable configurations in RNA-world automata represents a mechanism for novelty generation in the evolutionary space, and propose avenues for future investigation of biological automata.

Read more
Quantitative Methods

An Approach for Clustering Subjects According to Similarities in Cell Distributions within Biopsies

In this paper, we introduce a novel and interpretable methodology to cluster subjects suffering from cancer, based on features extracted from their biopsies. Contrary to existing approaches, we propose here to capture complex patterns in the repartitions of their cells using histograms, and compare subjects on the basis of these repartitions. We describe here our complete workflow, including creation of the database, cells segmentation and phenotyping, computation of complex features, choice of a distance function between features, clustering between subjects using that distance, and survival analysis of obtained clusters. We illustrate our approach on a database of hematoxylin and eosin (H&E)-stained tissues of subjects suffering from Stage I lung adenocarcinoma, where our results match existing knowledge in prognosis estimation with high confidence.

Read more
Quantitative Methods

An Automated, Cost-Effective Optical System for Accelerated Anti-microbial Susceptibility Testing (AST) using Deep Learning

Antimicrobial susceptibility testing (AST) is a standard clinical procedure used to quantify antimicrobial resistance (AMR). Currently, the gold standard method requires incubation for 18-24 h and subsequent inspection for growth by a trained medical technologist. We demonstrate an automated, cost-effective optical system that delivers early AST results, minimizing incubation time and eliminating human errors, while remaining compatible with standard phenotypic assay workflow. The system is composed of cost-effective components and eliminates the need for optomechanical scanning. A neural network processes the captured optical intensity information from an array of fiber optic cables to determine whether bacterial growth has occurred in each well of a 96-well microplate. When the system was blindly tested on isolates from 33 patients with Staphylococcus aureus infections, 95.03% of all the wells containing growth were correctly identified using our neural network, with an average of 5.72 h of incubation time required to identify growth. 90% of all wells (growth and no-growth) were correctly classified after 7 h, and 95% after 10.5 h. Our deep learning-based optical system met the FDA-defined criteria for essential and categorical agreements for all 14 antibiotics tested after an average of 6.13 h and 6.98 h, respectively. Furthermore, our system met the FDA criteria for major and very major error rates for 11 of 12 possible drugs after an average of 4.02 h, and 9 of 13 possible drugs after an average of 9.39 h, respectively. This system could enable faster, inexpensive, automated AST, especially in resource limited settings, helping to mitigate the rise of global AMR.

Read more
Quantitative Methods

An early warning tool for predicting mortality risk of COVID-19 patients using machine learning

COVID-19 pandemic has created an extreme pressure on the global healthcare services. Fast, reliable and early clinical assessment of the severity of the disease can help in allocating and prioritizing resources to reduce mortality. In order to study the important blood biomarkers for predicting disease mortality, a retrospective study was conducted on 375 COVID-19 positive patients admitted to Tongji Hospital (China) from January 10 to February 18, 2020. Demographic and clinical characteristics, and patient outcomes were investigated using machine learning tools to identify key biomarkers to predict the mortality of individual patient. A nomogram was developed for predicting the mortality risk among COVID-19 patients. Lactate dehydrogenase, neutrophils (%), lymphocyte (%), high sensitive C-reactive protein, and age - acquired at hospital admission were identified as key predictors of death by multi-tree XGBoost model. The area under curve (AUC) of the nomogram for the derivation and validation cohort were 0.961 and 0.991, respectively. An integrated score (LNLCA) was calculated with the corresponding death probability. COVID-19 patients were divided into three subgroups: low-, moderate- and high-risk groups using LNLCA cut-off values of 10.4 and 12.65 with the death probability less than 5%, 5% to 50%, and above 50%, respectively. The prognostic model, nomogram and LNLCA score can help in early detection of high mortality risk of COVID-19 patients, which will help doctors to improve the management of patient stratification.

Read more
Quantitative Methods

An evaluation of machine learning techniques to predict the outcome of children treated for Hodgkin-Lymphoma on the AHOD0031 trial: A report from the Children's Oncology Group

In this manuscript we analyze a data set containing information on children with Hodgkin Lymphoma (HL) enrolled on a clinical trial. Treatments received and survival status were collected together with other covariates such as demographics and clinical measurements. Our main task is to explore the potential of machine learning (ML) algorithms in a survival analysis context in order to improve over the Cox Proportional Hazard (CoxPH) model. We discuss the weaknesses of the CoxPH model we would like to improve upon and then we introduce multiple algorithms, from well-established ones to state-of-the-art models, that solve these issues. We then compare every model according to the concordance index and the brier score. Finally, we produce a series of recommendations, based on our experience, for practitioners that would like to benefit from the recent advances in artificial intelligence.

Read more
Quantitative Methods

An overview of continuous and discrete phasor analysis of binned or time-gated periodic decays

Time-resolved analysis of periodically excited luminescence decays by the phasor method in the presence of time-gating or binning is revisited. Analytical expressions for discrete configurations of square gates are derived and the locus of the phasors of such modified periodic single-exponential decays is compared to the canonical universal semicircle. The effects of IRF offset, decay truncation and gate shape are also discussed. Finally, modified expressions for the phase and modulus lifetimes are provided for some simple cases. A discussion of a modified phasor calibration approach is presented.

Read more
Quantitative Methods

An unbiased spatiotemporal risk model for COVID-19 with epidemiologically meaningful dynamics (A systematic framework for spatiotemporal modelling of COVID-19 disease)

Spatiotemporal modelling of infectious diseases such as COVID-19 involves using a variety of epidemiological metrics such as regional proportion of cases or regional positivity rates. Although observing their changes over time is critical to estimate the regional disease burden, the dynamical properties of these measures as well as cross-relationships are not systematically explained. Here we provide a spatiotemporal framework composed of six commonly used and newly constructed epidemiological metrics and conduct a case study evaluation. We introduce a refined risk model that is biased neither by the differences in population sizes nor by the spatial heterogeneity of testing. In particular, the proposed methodology is useful for the unbiased identification of time periods with elevated COVID-19 risk, without sensitivity to spatial heterogeneity of neither population nor testing. Our results also provide insights regarding regional prioritization of testing and the consequences of potential synchronization of epidemics between regions.

Read more

Ready to get started?

Join us today