Sebastian Raschka | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sebastian Raschka is active.

Explore More

Publication

Featured researches published by Sebastian Raschka.

Journal of Social Structure | 2017

BioPandas: Working with molecular structures in pandas DataFrames

Sebastian Raschka

Furthermore, useful small-molecule related functions are provided for reading and parsing millions of small molecule structures (from multi-MOL2 files (Tripos 2007)) fast and efficiently in virtual screening applications. Inbuilt functions for filtering molecules by the presence of functional groups and their pair-wise distances to each other make BioPandas a particularly attractive utility library for virtual screening and protein-ligand docking applications.

Proteins | 2016

Detecting the native ligand orientation by interfacial rigidity: SiteInterlock

Sebastian Raschka; Joseph Bemister-Buffington; Leslie A. Kuhn

Understanding the physical attributes of protein‐ligand interfaces, the source of most biological activity, is a fundamental problem in biophysics. Knowing the characteristic features of interfaces also enables the design of molecules with potent and selective interactions. Prediction of native protein‐ligand interactions has traditionally focused on the development of physics‐based potential energy functions, empirical scoring functions that are fit to binding data, and knowledge‐based potentials that assess the likelihood of pairwise interactions. Here we explore a new approach, testing the hypothesis that protein‐ligand binding results in computationally detectable rigidification of the protein‐ligand interface. Our SiteInterlock approach uses rigidity theory to efficiently measure the relative interfacial rigidity of a series of small‐molecule ligand orientations and conformations for a number of protein complexes. In the majority of cases, SiteInterlock detects a near‐native binding mode as being the most rigid, with particularly robust performance relative to other methods when the ligand‐free conformation of the protein is provided. The interfacial rigidification of both the protein and ligand prove to be important characteristics of the native binding mode. This measure of rigidity is also sensitive to the spatial coupling of interactions and bond‐rotational degrees of freedom in the interface. While the predictive performance of SiteInterlock is competitive with the best of the five other scoring functions tested, its measure of rigidity encompasses cooperative rather than just additive binding interactions, providing novel information for detecting native‐like complexes. SiteInterlock shows special strength in enhancing the prediction of native complexes by ruling out inaccurate poses. Proteins 2016; 84:1888–1901.

Journal of Computer-aided Molecular Design | 2018

Protein–ligand interfaces are polarized: discovery of a strong trend for intermolecular hydrogen bonds to favor donors on the protein side with implications for predicting and designing ligand complexes

Sebastian Raschka; Alex J. Wolf; Joseph Bemister-Buffington; Leslie A. Kuhn

Understanding how proteins encode ligand specificity is fascinating and similar in importance to deciphering the genetic code. For protein–ligand recognition, the combination of an almost infinite variety of interfacial shapes and patterns of chemical groups makes the problem especially challenging. Here we analyze data across non-homologous proteins in complex with small biological ligands to address observations made in our inhibitor discovery projects: that proteins favor donating H-bonds to ligands and avoid using groups with both H-bond donor and acceptor capacity. The resulting clear and significant chemical group matching preferences elucidate the code for protein-native ligand binding, similar to the dominant patterns found in nucleic acid base-pairing. On average, 90% of the keto and carboxylate oxygens occurring in the biological ligands formed direct H-bonds to the protein. A two-fold preference was found for protein atoms to act as H-bond donors and ligand atoms to act as acceptors, and 76% of all intermolecular H-bonds involved an amine donor. Together, the tight chemical and geometric constraints associated with satisfying donor groups generate a hydrogen-bonding lock that can be matched only by ligands bearing the right acceptor-rich key. Measuring an index of H-bond preference based on the observed chemical trends proved sufficient to predict other protein–ligand complexes and can be used to guide molecular design. The resulting Hbind and Protein Recognition Index software packages are being made available for rigorously defining intermolecular H-bonds and measuring the extent to which H-bonding patterns in a given complex match the preference key.

Archive | 2018

Automated Inference of Chemical Discriminants of Biological Activity

Sebastian Raschka; Anne M. Scott; Mar Huertas; Weiming Li; Leslie A. Kuhn

Ligand-based virtual screening has become a standard technique for the efficient discovery of bioactive small molecules. Following assays to determine the activity of compounds selected by virtual screening, or other approaches in which dozens to thousands of molecules have been tested, machine learning techniques make it straightforward to discover the patterns of chemical groups that correlate with the desired biological activity. Defining the chemical features that generate activity can be used to guide the selection of molecules for subsequent rounds of screening and assaying, as well as help design new, more active molecules for organic synthesis.The quantitative structure-activity relationship machine learning protocols we describe here, using decision trees, random forests, and sequential feature selection, take as input the chemical structure of a single, known active small molecule (e.g., an inhibitor, agonist, or substrate) for comparison with the structure of each tested molecule. Knowledge of the atomic structure of the protein target and its interactions with the active compound are not required. These protocols can be modified and applied to any data set that consists of a series of measured structural, chemical, or other features for each tested molecule, along with the experimentally measured value of the response variable you would like to predict or optimize for your project, for instance, inhibitory activity in a biological assay or ΔGbinding. To illustrate the use of different machine learning algorithms, we step through the analysis of a dataset of inhibitor candidates from virtual screening that were tested recently for their ability to inhibit GPCR-mediated signaling in a vertebrate.

Journal of Social Structure | 2018

MLxtend: Providing machine learning and data science utilities and extensions to Python’s scientific computing stack

Sebastian Raschka

MLxtend is a library that implements a variety of core algorithms and utilities for machine learning and data mining. The primary goal of MLxtend is to make commonly used tools accessible to researchers in academia and data scientists in industries focussing on userfriendly and intuitive APIs and compatibility to existing machine learning libraries, such as scikit-learn, when appropriate. While MLxtend implements a large variety of functions, highlights include sequential feature selection algorithms (Pudil, Novovičová, and Kittler 1994), implementations of stacked generalization (Wolpert 1992) for classification and regression, and algorithms for frequent pattern mining (Agrawal and Ramakrishnan 1994). The sequential feature selection algorithms cover forward, backward, forward floating, and backward floating selection and leverage scikit-learn’s cross-validation API (Pedregosa et al. 2011) to ensure satisfactory generalization performance upon constructing and selecting feature subsets. Besides, visualization functions are provided that allow users to inspect the estimated predictive performance, including performance intervals, for different feature subsets. The ensemble methods in MLxtend cover majority voting, stacking, and stacked generalization, all of which are compatible with scikit-learn estimators and other libraries as XGBoost (Chen and Guestrin 2016). In addition to feature selection, classification, and regression algorithms, MLxtend implements model evaluation techniques for comparing the performance of two different models via McNemar’s test and multiple models via Cochran’s Q test. An implementation of the 5x2 cross-validated paired t-test (Dietterich 1998) allows users to compare the performance of machine learning algorithms to each other. Furthermore, different flavors of the Bootstrap method (Efron and Tibshirani 1994), such as the .632 Bootstrap method (Efron 1983) are implemented to compute confidence intervals of performance estimates. All in all, MLxtend provides a large variety of different utilities that build upon and extend the capabilities of Python’s scientific computing stack.

Archive | 2015