Featured Researches

Quantitative Methods

Autonomous discovery in the chemical sciences part I: Progress

This two-part review examines how automation has contributed to different aspects of discovery in the chemical sciences. In this first part, we describe a classification for discoveries of physical matter (molecules, materials, devices), processes, and models and how they are unified as search problems. We then introduce a set of questions and considerations relevant to assessing the extent of autonomy. Finally, we describe many case studies of discoveries accelerated by or resulting from computer assistance and automation from the domains of synthetic chemistry, drug discovery, inorganic chemistry, and materials science. These illustrate how rapid advancements in hardware automation and machine learning continue to transform the nature of experimentation and modelling. Part two reflects on these case studies and identifies a set of open challenges for the field.

Read more
Quantitative Methods

Autonomous discovery in the chemical sciences part II: Outlook

This two-part review examines how automation has contributed to different aspects of discovery in the chemical sciences. In this second part, we reflect on a selection of exemplary studies. It is increasingly important to articulate what the role of automation and computation has been in the scientific process and how that has or has not accelerated discovery. One can argue that even the best automated systems have yet to ``discover'' despite being incredibly useful as laboratory assistants. We must carefully consider how they have been and can be applied to future problems of chemical discovery in order to effectively design and interact with future autonomous platforms. The majority of this article defines a large set of open research directions, including improving our ability to work with complex data, build empirical models, automate both physical and computational experiments for validation, select experiments, and evaluate whether we are making progress toward the ultimate goal of autonomous discovery. Addressing these practical and methodological challenges will greatly advance the extent to which autonomous systems can make meaningful discoveries.

Read more
Quantitative Methods

Basis Function Based Data Driven Learning for the Inverse Problem of Electrocardiography

Objective: This paper proposes an neural network approach for predicting heart surface potentials (HSPs) from body surface potentials (BSPs), which reframes the traditional inverse problem of electrocardiography into a regression problem through the use of Gaussian 3D (G3D) basis function decomposition. Methods: HSPs were generated using G3D basis functions and passed through a boundary element forward model to obtain corresponding BSPs. The generated BSPs (input) and HSPs (output) were used to train a neural network, which was then used to predict a variety of synthesized and decomposed real-world HSPs. Results: Fitted G3D basis function parameters can accurately reconstruct the real-world left ventricular paced recording with percent root mean squared error (RMSE) of 1.34±1.30 %. The basis data trained neural network was able to predict G3D basis function synthesized data with RMSE of 8.46±1.55 %, and G3D representation of real-world data with RMSE of 18.5±5.25 %. Activation map produced from the predicted time series had a RMSE of 17.0% and mean absolute difference of 10.3±10.8 ms when compared to that produced from the actual left ventricular paced recording. Conclusion: A Gaussian basis function based data driven model for re-framing the inverse problem of electrocardiography as a regression problem is successful and produces promising time series and activation map predictions of real-world recordings even when only trained using Guassian data. Significance: The HSPs predicted by the neural network can be used to create activation maps to identify cardiac dysfunctions during clinical assessment.

Read more
Quantitative Methods

Bayesian information-theoretic calibration of patient-specific radiotherapy sensitivity parameters for informing effective scanning protocols in cancer

With new advancements in technology, it is now possible to collect data for a variety of different metrics describing tumor growth, including tumor volume, composition, and vascularity, among others. For any proposed model of tumor growth and treatment, we observe large variability among individual patients' parameter values, particularly those relating to treatment response; thus, exploiting the use of these various metrics for model calibration can be helpful to infer such patient-specific parameters both accurately and early, so that treatment protocols can be adjusted mid-course for maximum efficacy. However, taking measurements can be costly and invasive, limiting clinicians to a sparse collection schedule. As such, the determination of optimal times and metrics for which to collect data in order to best inform proper treatment protocols could be of great assistance to clinicians. In this investigation, we employ a Bayesian information-theoretic calibration protocol for experimental design in order to identify the optimal times at which to collect data for informing treatment parameters. Within this procedure, data collection times are chosen sequentially to maximize the reduction in parameter uncertainty with each added measurement, ensuring that a budget of n high-fidelity experimental measurements results in maximum information gain about the low-fidelity model parameter values. In addition to investigating the optimal temporal pattern for data collection, we also develop a framework for deciding which metrics should be utilized at each data collection point. We illustrate this framework with a variety of toy examples, each utilizing a radiotherapy treatment regimen. For each scenario, we analyze the dependence of the predictive power of the low-fidelity model upon the measurement budget.

Read more
Quantitative Methods

Bayesian uncertainty quantification for data-driven equation learning

Equation learning aims to infer differential equation models from data. While a number of studies have shown that differential equation models can be successfully identified when the data are sufficiently detailed and corrupted with relatively small amounts of noise, the relationship between observation noise and uncertainty in the learned differential equation models remains unexplored. We demonstrate that for noisy data sets there exists great variation in both the structure of the learned differential equation models as well as the parameter values. We explore how to combine data sets to quantify uncertainty in the learned models, and at the same time draw mechanistic conclusions about the target differential equations. We generate noisy data using a stochastic agent-based model and combine equation learning methods with approximate Bayesian computation (ABC) to show that the correct differential equation model can be successfully learned from data, while a quantification of uncertainty is given by a posterior distribution in parameter space.

Read more
Quantitative Methods

Bee Cluster 3D: A system to monitor the temperature in a hive over time

A new system, Bee Cluster 3D, allowing the study of the time evolution of the 3D temperature distribution in a bee hive is presented. This system can be used to evaluate the cluster size and the location of the queen during winter. In summer, the device can be used to quantify the size of the brood nest and the breeding activity of the queen. The system does not disturb the activity of the colony and can be used on any hive. This electronic system was developed to be non-intrusive, miniaturized, and energy autonomous.

Read more
Quantitative Methods

Benchmarking real-time monitoring strategies for ethanol production from lignocellulosic biomass

The goal of this paper is to review and critically assess different methods to monitor key process variables for ethanol production from lignocellulosic biomass. Because cellulose-based biofuels cannot yet compete with non-cellulosic biofuels, process control and optimization are of importance to lower the production costs. This study reviews different monitoring schemes, to indicate what the added value of real-time monitoring is for process control. Furthermore, a comparison is made on different monitoring techniques to measure the off-gas, the concentrations of dissolved components in the inlet to the process, the concentrations of dissolved components in the reactor, and the biomass concentration. Finally, soft sensor techniques and available models are discussed, to give an overview of modeling techniques that analyze data, with the aim of coupling the soft sensor predictions to the control and optimization of cellulose to ethanol fermentation. The paper ends with a discussion of future needs and developments.

Read more
Quantitative Methods

Beyond Chemical 1D knowledge using Transformers

In the present paper we evaluated efficiency of the recent Transformer-CNN models to predict target properties based on the augmented stereochemical SMILES. We selected a well-known Cliff activity dataset as well as a Dipole moment dataset and compared the effect of three representations for R/S stereochemistry in SMILES. The considered representations were SMILES without stereochemistry (noChiSMI), classical relative stereochemistry encoding (RelChiSMI) and an alternative version with absolute stereochemistry encoding (AbsChiSMI). The inclusion of R/S in SMILES representation allowed simplify the assignment of the respective information based on SMILES representation, but did not always show advantages on regression or classification tasks. Interestingly, we did not see degradation of the performance of Transformer-CNN models when the stereochemical information was not present in SMILES. Moreover, these models showed higher or similar performance compared to descriptor-based models based on 3D structures. These observations are an important step in NLP modeling of 3D chemical tasks. An open challenge remains whether Transformer-CNN can efficiently embed 3D knowledge from SMILES input and whether a better representation could further increase the accuracy of this approach.

Read more
Quantitative Methods

Beyond Rescorla-Wagner: the ups and downs of learning

We check the robustness of a recently proposed dynamical model of associative Pavlovian learning that extends the Rescorla-Wagner (RW) model in a natural way and predicts progressively damped oscillations in the response of the subjects. Using the data of two experiments, we compare the dynamical oscillatory model (DOM) with an oscillatory model made of the superposition of the RW learning curve and oscillations. Not only do data clearly show an oscillatory pattern, but they also favor the DOM over the added oscillation model, thus pointing out that these oscillations are the manifestation of an associative process. The latter is interpreted as the fact that subjects make predictions on trial outcomes more extended in time than in the RW model, but with more uncertainty.

Read more
Quantitative Methods

BioMETA: A multiple specification parameter estimation system for stochastic biochemical models

The inherent behavioral variability exhibited by stochastic biochemical systems makes it a challenging task for human experts to manually analyze them. Computational modeling of such systems helps in investigating and predicting the behaviors of the underlying biochemical processes but at the same time introduces the presence of several unknown parameters. A key challenge faced in this scenario is to determine the values of these unknown parameters against known behavioral specifications. The solutions that have been presented so far estimate the parameters of a given model against a single specification whereas a correct model is expected to satisfy all the behavioral specifications when instantiated with a single set of parameter values. We present a new method, BioMETA, to address this problem such that a single set of parameter values causes a parameterized stochastic biochemical model to satisfy all the given probabilistic temporal logic behavioral specifications simultaneously. Our method is based on combining a multiple hypothesis testing based statistical model checking technique with simulated annealing search to look for a single set of parameter values so that the given parameterized model satisfies multiple probabilistic behavioral specifications. We study two stochastic rule-based models of biochemical receptors, namely, Fc ϵ RI and T-cell as our benchmarks to evaluate the usefulness of the presented method. Our experimental results successfully estimate 26 parameters of Fc ϵ RI and 29 parameters of T-cell receptor model against three probabilistic temporal logic behavioral specifications each.

Read more

Ready to get started?

Join us today