Featured Researches

Quantitative Methods

Lamotrigine Therapy for Bipolar Depression: Analysis of Self-Reported Patient Data

Background: Depression in people with bipolar disorder is a major cause of long-term disability, possibly leading to early mortality and currently, limited safe and effective therapies exist. A double-blinded randomized placebo-controlled trial (CEQUEL study) was conducted to evaluate the efficacy of Lamotrigine plus Quetiapine versus Quetiapine monotherapy in patients with bipolar type I or type II disorders. Objective: The objective of our study was to reanalyze CEQUEL data and determine an unbiased classification accuracy for active lamotrigine versus placebo. We also wanted to establish the time it took for the drug to provide statistically significant outcomes. Methods: Between October 21, 2008 and April 27, 2012, 202 participants from 27 sites in United Kingdom were randomly assigned to two treatments; 101: lamotrigine, 101: placebo. The primary variable used for estimating depressive symptoms was based on the Quick Inventory of Depressive Symptomatology-self report version 16 (QIDS-SR16). We analyze the data using feature engineering and simple classifiers. Results: From weeks 10 to 14, the mean difference in QIDS-SR16 ratings between the groups was -1.6317 (P=.09; sample size=81, 77; 95% CI -0.2403 to 3.5036). From weeks 48 to 52, the mean difference was -2.0032 (P=.09; sample size=54, 48; 95% CI -0.3433 to 4.3497). The coefficient of variation and detrended fluctuation analysis (DFA) exponent alpha had the greatest explanatory power. The out-of-sample classification accuracy for the 138 participants who reported more than 10 times after week 12 was 62%. A consistent classification accuracy higher than the no-information benchmark was obtained in week 44. Conclusions: Lamotrigine plus Quetiapine treatment decreased depressive symptoms in patients, but with substantial temporal instability. A trial of at least 44 weeks was required to achieve consistent results.

Read more
Quantitative Methods

Learning Curves for Drug Response Prediction in Cancer Cell Lines

Motivated by the size of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating data, a common question is whether the proposed predictors can further improve the generalization performance with more training data. We utilize empirical learning curves for evaluating and comparing the data scaling properties of two neural networks (NNs) and two gradient boosting decision tree (GBDT) models trained on four drug screening datasets. The learning curves are accurately fitted to a power law model, providing a framework for assessing the data scaling behavior of these predictors. The curves demonstrate that no single model dominates in terms of prediction performance across all datasets and training sizes, suggesting that the shape of these curves depends on the unique model-dataset pair. The multi-input NN (mNN), in which gene expressions and molecular drug descriptors are input into separate subnetworks, outperforms a single-input NN (sNN), where the cell and drug features are concatenated for the input layer. In contrast, a GBDT with hyperparameter tuning exhibits superior performance as compared with both NNs at the lower range of training sizes for two of the datasets, whereas the mNN performs better at the higher range of training sizes. Moreover, the trajectory of the curves suggests that increasing the sample size is expected to further improve prediction scores of both NNs. These observations demonstrate the benefit of using learning curves to evaluate predictors, providing a broader perspective on the overall data scaling characteristics. The fitted power law curves provide a forward-looking performance metric and can serve as a co-design tool to guide experimental biologists and computational scientists in the design of future experiments.

Read more
Quantitative Methods

Learning Deep Models from Synthetic Data for Extracting Dolphin Whistle Contours

We present a learning-based method for extracting whistles of toothed whales (Odontoceti) in hydrophone recordings. Our method represents audio signals as time-frequency spectrograms and decomposes each spectrogram into a set of time-frequency patches. A deep neural network learns archetypical patterns (e.g., crossings, frequency modulated sweeps) from the spectrogram patches and predicts time-frequency peaks that are associated with whistles. We also developed a comprehensive method to synthesize training samples from background environments and train the network with minimal human annotation effort. We applied the proposed learn-from-synthesis method to a subset of the public Detection, Classification, Localization, and Density Estimation (DCLDE) 2011 workshop data to extract whistle confidence maps, which we then processed with an existing contour extractor to produce whistle annotations. The F1-score of our best synthesis method was 0.158 greater than our baseline whistle extraction algorithm (~25% improvement) when applied to common dolphin (Delphinus spp.) and bottlenose dolphin (Tursiops truncatus) whistles.

Read more
Quantitative Methods

Learning Equations from Biological Data with Limited Time Samples

Equation learning methods present a promising tool to aid scientists in the modeling process for biological data. Previous equation learning studies have demonstrated that these methods can infer models from rich datasets, however, the performance of these methods in the presence of common challenges from biological data has not been thoroughly explored. We present an equation learning methodology comprised of data denoising, equation learning, model selection and post-processing steps that infers a dynamical systems model from noisy spatiotemporal data. The performance of this methodology is thoroughly investigated in the face of several common challenges presented by biological data, namely, sparse data sampling, large noise levels, and heterogeneity between datasets. We find that this methodology can accurately infer the correct underlying equation and predict unobserved system dynamics from a small number of time samples when the data is sampled over a time interval exhibiting both linear and nonlinear dynamics. Our findings suggest that equation learning methods can be used for model discovery and selection in many areas of biology when an informative dataset is used. We focus on glioblastoma multiforme modeling as a case study in this work to highlight how these results are informative for data-driven modeling-based tumor invasion predictions.

Read more
Quantitative Methods

Learning Heat Diffusion for Network Alignment

Networks are abundant in the life sciences. Outstanding challenges include how to characterize similarities between networks, and in extension how to integrate information across networks. Yet, network alignment remains a core algorithmic problem. Here, we present a novel learning algorithm called evolutionary heat diffusion-based network alignment (EDNA) to address this challenge. EDNA uses the diffusion signal as a proxy for computing node similarities between networks. Comparing EDNA with state-of-the-art algorithms on a popular protein-protein interaction network dataset, using four different evaluation metrics, we achieve (i) the most accurate alignments, (ii) increased robustness against noise, and (iii) superior scaling capacity. The EDNA algorithm is versatile in that other available network alignments/embeddings can be used as an initial baseline alignment, and then EDNA works as a wrapper around them by running the evolutionary diffusion on top of them. In conclusion, EDNA outperforms state-of-the-art methods for network alignment, thus setting the stage for large-scale comparison and integration of networks.

Read more
Quantitative Methods

Learning to swim in potential flow

Fish swim by undulating their bodies. These propulsive motions require coordinated shape changes of a body that interacts with its fluid environment, but the specific shape coordination that leads to robust turning and swimming motions remains unclear. To address the problem of underwater motion planning, we propose a simple model of a three-link fish swimming in a potential flow environment and we use model-free reinforcement learning for shape control. We arrive at optimal shape changes for two swimming tasks: swimming in a desired direction and swimming towards a known target. This fish model belongs to a class of problems in geometric mechanics, known as driftless dynamical systems, which allow us to analyze the swimming behavior in terms of geometric phases over the shape space of the fish. These geometric methods are less intuitive in the presence of drift. Here, we use the shape space analysis as a tool for assessing, visualizing, and interpreting the control policies obtained via reinforcement learning in the absence of drift. We then examine the robustness of these policies to drift-related perturbations. Although the fish has no direct control over the drift itself, it learns to take advantage of the presence of moderate drift to reach its target.

Read more
Quantitative Methods

Leveraging Structured Biological Knowledge for Counterfactual Inference: a Case Study of Viral Pathogenesis

Counterfactual inference is a useful tool for comparing outcomes of interventions on complex systems. It requires us to represent the system in form of a structural causal model, complete with a causal diagram, probabilistic assumptions on exogenous variables, and functional assignments. Specifying such models can be extremely difficult in practice. The process requires substantial domain expertise, and does not scale easily to large systems, multiple systems, or novel system modifications. At the same time, many application domains, such as molecular biology, are rich in structured causal knowledge that is qualitative in nature. This manuscript proposes a general approach for querying a causal biological knowledge graph, and converting the qualitative result into a quantitative structural causal model that can learn from data to answer the question. We demonstrate the feasibility, accuracy and versatility of this approach using two case studies in systems biology. The first demonstrates the appropriateness of the underlying assumptions and the accuracy of the results. The second demonstrates the versatility of the approach by querying a knowledge base for the molecular determinants of a severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-induced cytokine storm, and performing counterfactual inference to estimate the causal effect of medical countermeasures for severely ill patients.

Read more
Quantitative Methods

Life Cycle Assessment of high rate algal ponds for wastewater treatment and resource recovery

The aim of this study was to assess the potential environmental impacts associated with high rate algal ponds (HRAP) systems for wastewater treatment and resource recovery in small communities. To this aim, a Life Cycle Assessment (LCA) and an economic assessment were carried out evaluating two alternatives: i) a HRAPs system for wastewater treatment where microalgal biomass is valorised for energy recovery (biogas production); ii) a HRAPs system for wastewater treatment where microalgal biomass is reused for nutrients recovery (biofertiliser production). Additionally, both alternatives were compared to a typical small-sized activated sludge system. The results showed that HRAPs system coupled with biogas production appeared to be more environmentally friendly than HRAPs system coupled with biofertiliser production in the climate change, ozone layer depletion, photochemical oxidant formation, and fossil depletion impact categories. Different climatic conditions have strongly influenced the results obtained in the eutrophication and metal depletion impact categories, with the HRAPs system located where warm temperatures and high solar radiation are predominant showing lower impact. In terms of costs, HRAPs systems seemed to be more economically feasible when combined with biofertiliser production instead of biogas.

Read more
Quantitative Methods

Limitations of ROC on Imbalanced Data: Evaluation of LVAD Mortality Risk Scores

Objective: This study illustrates the ambiguity of ROC in evaluating two classifiers of 90-day LVAD mortality. This paper also introduces the precision recall curve (PRC) as a supplemental metric that is more representative of LVAD classifiers performance in predicting the minority class. Background: In the LVAD domain, the receiver operating characteristic (ROC) is a commonly applied metric of performance of classifiers. However, ROC can provide a distorted view of classifiers ability to predict short-term mortality due to the overwhelmingly greater proportion of patients who survive, i.e. imbalanced data. Methods: This study compared the ROC and PRC for the outcome of two classifiers for 90-day LVAD mortality for 800 patients (test group) recorded in INTERMACS who received a continuous-flow LVAD between 2006 and 2016 (mean age of 59 years; 146 females vs. 654 males) in which mortality rate is only %8 at 90-day (imbalanced data). The two classifiers were HeartMate Risk Score (HMRS) and a Random Forest (RF). Results: The ROC indicates fairly good performance of RF and HRMS classifiers with Area Under Curves (AUC) of 0.77 vs. 0.63, respectively. This is in contrast with their PRC with AUC of 0.43 vs. 0.16 for RF and HRMS, respectively. The PRC for HRMS showed the precision rapidly dropped to only 10% with slightly increasing sensitivity. Conclusion: The ROC can portray an overly-optimistic performance of a classifier or risk score when applied to imbalanced data. The PRC provides better insight about the performance of a classifier by focusing on the minority class.

Read more
Quantitative Methods

Local Causal Structure Learning and its Discovery Between Type 2 Diabetes and Bone Mineral Density

Type 2 diabetes (T2DM), one of the most prevalent chronic diseases, affects the glucose metabolism of the human body, which decreases the quantity of life and brings a heavy burden on social medical care. Patients with T2DM are more likely to suffer bone fragility fracture as diabetes affects bone mineral density (BMD). However, the discovery of the determinant factors of BMD in a medical way is expensive and time-consuming. In this paper, we propose a novel algorithm, Prior-Knowledge-driven local Causal structure Learning (PKCL), to discover the underlying causal mechanism between BMD and its factors from the clinical data. Since there exist limited data but redundant prior knowledge for medicine, PKCL adequately utilize the prior knowledge to mine the local causal structure for the target relationship. Combining the medical prior knowledge with the discovered causal relationships, PKCL can achieve more reliable results without long-standing medical statistical experiments. Extensive experiments are conducted on a newly provided clinical data set. The experimental study of PKCL on the data is proved to highly corresponding with existing medical knowledge, which demonstrates the superiority and effectiveness of PKCL. To illustrate the importance of prior knowledge, the result of the algorithm without prior knowledge is also investigated.

Read more

Ready to get started?

Join us today