Michael P. Krein
Rensselaer Polytechnic Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Michael P. Krein.
Journal of Physical Chemistry A | 2011
Michael P. Krein; N. Sukumar
Discontinuous changes in molecular structure (resulting from continuous transformations of molecular coordinates) lead to changes in chemical properties and biological activities that chemists attempt to describe through structure-activity or structure-property relationships (QSAR/QSPR). Such relationships are commonly envisioned in a continuous high-dimensional space of numerical descriptors, referred to as chemistry space. The choice of descriptors defining coordinates within chemistry space and the choice of similarity metrics thus influence the partitioning of this space into regions corresponding to local structural similarity. These are the regions (known as domains of applicability) most likely to be successfully modeled by a structure-activity relationship. In this work the network topology and scaling relationships of chemistry spaces are first investigated independent of a specific biological activity. Chemistry spaces studied include the ZINC data set, a qHTS PubChem bioassay, as well as the space of protein binding sites from the PDB. The characteristics of these networks are compared and contrasted with those of the bioassay SALI subnetwork, which maps discontinuities or cliffs in the structure-activity landscape. Mapping the locations of activity cliffs and comparing the global characteristics of SALI subnetworks with those of the underlying chemistry space networks generated using different representations, can guide the choice of a better representation. A higher local density of SALI edges with a particular representation indicates a more challenging structure-activity relationship using that fingerprint in that region of chemistry space.
Journal of Materials Science | 2012
N. Sukumar; Michael P. Krein; Qiong Luo; Curt M. Breneman
We demonstrate applications of quantitative structure–property relationship (QSPR) modeling to supplement first-principles computations in materials design. We have here focused on the design of polymers with specific electronic properties. We first show that common materials properties such as the glass transition temperature (Tg) can be effectively modeled by QSPR to generate highly predictive models that relate polymer repeat unit structure to Tg. Next, QSPR modeling is shown to supplement and guide first-principles density functional theory (DFT) computations in the design of polymers with specific dielectric properties, thereby leveraging the power of first-principles computations by providing high-throughput capability. Our approach consists of multiple rounds of validated MQSPR modeling and DFT computations to optimize the polymer skeleton as well as functional group substitutions thereof. Rigorous model validation protocols insure that the statistical models are able to make valid predictions on molecules outside the training set. Future work with inverse QSPRs has the potential to further reduce the time to optimize materials properties.
Bioinformatics | 2010
Sourav Das; Michael P. Krein; Curt M. Breneman
SUMMARY Structure-based approaches complement ligand-based approaches for lead-discovery and cross-reactivity prediction. We present to the scientific community a web server for comparing the surface of a ligand bound site of a protein against a ligand bound site surface database of 106 796 sites. The web server implements the property encoded shape distributions (PESD) algorithm for surface comparison. A typical virtual screen takes 5 min to complete. The output provides a ranked list of sites (by site similarity), hyperlinked to the corresponding entries in the PDB and PDBeChem databases. AVAILABILITY The server is freely accessible at http://reccr.chem.rpi.edu/Software/pesdserv/
Journal of Chemical Information and Modeling | 2011
Charles Bergeron; Gregory M. Moore; Michael P. Krein; Curt M. Breneman; Kristin P. Bennett
Least-squares fitting of the Hill equation to quantitative high-throughput screening (qHTS) assays results in frequent unsatisfactory fits. We learn and exploit prior knowledge to improve the Hill fitting in a nonlinear regression method called domain knowledge fitter (DK-fitter). This paper formulates and solves DK-fitter for 44 public qHTS data sets. This new Hill parameter estimation technique is validated using three unbiased approaches, including a novel method that involves generating simulated samples. This paper fosters the extraction of higher quality information from screens for improved potency evaluation.
Methods of Molecular Biology | 2012
N. Sukumar; Michael P. Krein; Mark J. Embrechts
The vast amounts of chemical and biological data available through robotic high-throughput assays and micro-array technologies require computational techniques for visualization, analysis, and predictive -modeling. Predictive cheminformatics and bioinformatics employ statistical methods to mine this data for hidden correlations and to retrieve molecules or genes with desirable biological activity from large databases, for the purpose of drug development. While many statistical methods are commonly employed and widely accessible, their proper use involves due consideration to data representation and preprocessing, model validation and domain of applicability estimation, similarity assessment, the nature of the structure-activity landscape, and model interpretation. This chapter seeks to review these considerations in light of the current state of the art in statistical modeling and to summarize the best practices in predictive cheminformatics.
international conference on foundations of augmented cognition | 2016
Matthias Ziegler; Amanda E. Kraft; Michael P. Krein; Li-Chuan Lo; Bradley D. Hatfield; William D. Casebeer; Bartlett A. H. Russell
Workload assessment models are an important tool to develop an understanding of an individuals limitations. Finding times of excess workload can help prevent an individual from continuing work that may result in human performance issues, such as an increase in errors or reaction time. Currently workload assessments are created on a task by task basis, varying drastically depending on sensors and task goals. Developing independent models for specific tasks is time consuming and not practical when being applied to real-world situations. In this experiment we collected physiological signals including electroencephalogram EEG, Heart Rate and Heart Rate Variability HR/HRV and Eye-Tracking. Subjects were asked to perform two independent tasks performed at two distinct levels of difficulty, an easy level and a difficult level. We then developed and compared performance of multiple models using deep and shallow learning techniques to determine the best methods to increase generalization of the models across tasks.
Sar and Qsar in Environmental Research | 2016
A. R. Kennicutt; Lisa Morkowchuk; Michael P. Krein; Curt M. Breneman; J. E. Kilduff
Abstract A quantitative structure–activity relationship was developed to predict the efficacy of carbon adsorption as a control technology for endocrine-disrupting compounds, pharmaceuticals, and components of personal care products, as a tool for water quality professionals to protect public health. Here, we expand previous work to investigate a broad spectrum of molecular descriptors including subdivided surface areas, adjacency and distance matrix descriptors, electrostatic partial charges, potential energy descriptors, conformation-dependent charge descriptors, and Transferable Atom Equivalent (TAE) descriptors that characterize the regional electronic properties of molecules. We compare the efficacy of linear (Partial Least Squares) and non-linear (Support Vector Machine) machine learning methods to describe a broad chemical space and produce a user-friendly model. We employ cross-validation, y-scrambling, and external validation for quality control. The recommended Support Vector Machine model trained on 95 compounds having 23 descriptors offered a good balance between good performance statistics, low error, and low probability of over-fitting while describing a wide range of chemical features. The cross-validated model using a log-uptake (qe) response calculated at an aqueous equilibrium concentration (Ce) of 1 μM described the training dataset with an r2 of 0.932, had a cross-validated r2 of 0.833, and an average residual of 0.14 log units.
Molecular Informatics | 2011
Charles Bergeron; Michael P. Krein; Gregory M. Moore; Curt M. Breneman; Kristin P. Bennett
Making suitable modeling choices is crucial for successful in silico drug design, and one of the most important of these is the proper extraction and curation of data from qHTS screens, and the use of optimized statistical learning methods to obtain valid models. More specifically, we aim to learn the top‐1 % most potent compounds against a variety of targets in a procedure we call virtual screening hit identification (VISHID). To do so, we exploit quantitative high‐throughput screens (qHTS) obtained from PubChem, descriptors derived from molecular structures, and support vector machines (SVM) for model generation. Our results illustrate how an appreciation of subtle issues underlying qHTS data extraction and the resulting SVM models created using these data can enhance the effectiveness of solutions and, in doing so, accelerate drug discovery.
Drug Development Research | 2014
N. Sukumar; Michael P. Krein; Ganesh Prabhu; Sudeepto Bhattacharya; Subhabrata Sen
Preclinical Research
2017 IEEE Conference on Cognitive and Computational Aspects of Situation Management (CogSIMA) | 2017
Amanda E. Kraft; Jon C. Russo; Michael P. Krein; Bartlett A. H. Russell; William D. Casebeer; Matthias Ziegler
Performance measurements using human sensing and assessment capabilities are limited by an inability to account for the multitude of variables that regulate performance state. Monitoring behavior alone is not adequate for prediction of future performance on a given task and no single physiological measurement can provide a complete assessment that influences performance. Here we investigate how to analyze a range of physiological measurements in near real-time using state of the art signal processing methods to predict performance. We developed multiple predictive computational models to assess when physiology markers that coincide with workload levels are reaching a point that performance decreases or increases may occur in the near future. Traditionally, models will vary significantly between studies (due to the diversity of tasks being tested, the number/type of sensors and differing analysis techniques), leading to specialized models that do not transfer between tasks and individuals. When a model is so specialized that it is only predictive to a specific task and not flexible to inter-or intra-individual differences without complete system retraining, it is impractical in applications outside of controlled experiments. To bring practical use of computational models in real world environments it is important to examine which types of physiological data that can both be reliably processed and analyzed in near real-time and that are highly predictive over time. It is also necessary to minimize the number of sensors in the real world so a sensor and signal sensitivity analysis needs to be performed. We identified and collected physiological signals linked to workload including electroencephalogram (EEG), Heart Rate and Heart Rate Variability (hR/HRV) and Eye-Tracking while performing multiple tasks at varying difficulty levels. We tested a variety of preprocessing methods and computational models, including radial basis function kernel support vector machines and neural networks, to determine predictive power as well as computational time for each type of model. The models were tested using each signal independently as well as combinations of all the signals.