Mohammad M. Sultan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mohammad M. Sultan is active.

Explore More

Publication

Featured researches published by Mohammad M. Sultan.

Biophysical Journal | 2017

MSMBuilder: Statistical Models for Biomolecular Dynamics

Matthew P. Harrigan; Mohammad M. Sultan; Carlos X. Hernández; Brooke E. Husic; Peter Eastman; Christian R. Schwantes; Kyle A. Beauchamp; Robert T. McGibbon; Vijay S. Pande

MSMBuilder is a software package for building statistical models of high-dimensional time-series data. It is designed with a particular focus on the analysis of atomistic simulations of biomolecular dynamics such as protein folding and conformational change. MSMBuilder is named for its ability to construct Markov state models (MSMs), a class of models that has gained favor among computational biophysicists. In addition to both well-established and newer MSM methods, the package includes complementary algorithms for understanding time-series data such as hidden Markov models and time-structure based independent component analysis. MSMBuilder boasts an easy to use command-line interface, as well as clear and consistent abstractions through its Python application programming interface. MSMBuilder was developed with careful consideration for compatibility with the broader machine learning community by following the design of scikit-learn. The package is used primarily by practitioners of molecular dynamics, but is just as applicable to other computational or experimental time-series measurements.

Journal of Chemical Physics | 2016

Optimized parameter selection reveals trends in Markov state models for protein folding

Brooke E. Husic; Robert T. McGibbon; Mohammad M. Sultan; Vijay S. Pande

As molecular dynamics simulations access increasingly longer time scales, complementary advances in the analysis of biomolecular time-series data are necessary. Markov state models offer a powerful framework for this analysis by describing a systems states and the transitions between them. A recently established variational theorem for Markov state models now enables modelers to systematically determine the best way to describe a systems dynamics. In the context of the variational theorem, we analyze ultra-long folding simulations for a canonical set of twelve proteins [K. Lindorff-Larsen et al., Science 334, 517 (2011)] by creating and evaluating many types of Markov state models. We present a set of guidelines for constructing Markov state models of protein folding; namely, we recommend the use of cross-validation and a kinetically motivated dimensionality reduction step for improved descriptions of folding dynamics. We also warn that precise kinetics predictions rely on the features chosen to describe the system and pose the description of kinetic uncertainty across ensembles of models as an open issue.

Journal of Chemical Theory and Computation | 2017

tICA-Metadynamics: Accelerating Metadynamics by Using Kinetically Selected Collective Variables

Mohammad M. Sultan; Vijay S. Pande

Metadynamics is a powerful enhanced molecular dynamics sampling method that accelerates simulations by adding history-dependent multidimensional Gaussians along selective collective variables (CVs). In practice, choosing a small number of slow CVs remains challenging due to the inherent high dimensionality of biophysical systems. Here we show that time-structure based independent component analysis (tICA), a recent advance in Markov state model literature, can be used to identify a set of variationally optimal slow coordinates for use as CVs for Metadynamics. We show that linear and nonlinear tICA-Metadynamics can complement existing MD studies by explicitly sampling the systems slowest modes and can even drive transitions along the slowest modes even when no such transitions are observed in unbiased simulations.

Scientific Reports | 2017

Millisecond dynamics of BTK reveal kinome-wide conformational plasticity within the apo kinase domain

Mohammad M. Sultan; Rajiah Aldrin Denny; Ray Unwalla; Frank Lovering; Vijay S. Pande

Bruton tyrosine kinase (BTK) is a key enzyme in B-cell development whose improper regulation causes severe immunodeficiency diseases. Design of selective BTK therapeutics would benefit from improved, in-silico structural modeling of the kinase’s solution ensemble. However, this remains challenging due to the immense computational cost of sampling events on biological timescales. In this work, we combine multi-millisecond molecular dynamics (MD) simulations with Markov state models (MSMs) to report on the thermodynamics, kinetics, and accessible states of BTK’s kinase domain. Our conformational landscape links the active state to several inactive states, connected via a structurally diverse intermediate. Our calculations predict a kinome-wide conformational plasticity, and indicate the presence of several new potentially druggable BTK states. We further find that the population of these states and the kinetics of their inter-conversion are modulated by protonation of an aspartate residue, establishing the power of MD & MSMs in predicting effects of chemical perturbations.

Journal of Social Structure | 2016

Osprey: Hyperparameter Optimization for Machine Learning

Robert T. McGibbon; Carlos X. Hernández; Matthew P. Harrigan; Steven Kearnes; Mohammad M. Sultan; Stanislaw Jastrzebski; Brooke E. Husic; Vijay S. Pande

Osprey is a tool for hyperparameter optimization of machine learning algorithms in Python. Hyperparameter optimization can often be an onerous process for researchers, due to timeconsuming experimental replicates, non-convex objective functions, and constant tension between exploration of global parameter space and local optimization (Jones, Schonlau, and Welch 1998). We’ve designed Osprey to provide scientists with a practical, easyto-use way of finding optimal model parameters. The software works seamlessly with scikit-learn estimators (Pedregosa et al. 2011) and supports many different search strategies for choosing the next set of parameters with which to evaluate a given model, including gaussian processes (GPy 2012), tree-structured Parzen estimators (Yamins, Tax, and Bergstra 2013), as well as random and grid search. As hyperparameter optimization is an embarrassingly parallel problem, Osprey can easily scale to hundreds of concurrent processes by executing a simple command-line program multiple times. This makes it easy to exploit large resources available in high-performance computing environments.

Journal of Chemical Theory and Computation | 2014

Automatic Selection of Order Parameters in the Analysis of Large Scale Molecular Dynamics Simulations

Mohammad M. Sultan; Gert Kiss; Diwakar Shukla; Vijay S. Pande

Given the large number of crystal structures and NMR ensembles that have been solved to date, classical molecular dynamics (MD) simulations have become powerful tools in the atomistic study of the kinetics and thermodynamics of biomolecular systems on ever increasing time scales. By virtue of the high-dimensional conformational state space that is explored, the interpretation of large-scale simulations faces difficulties not unlike those in the big data community. We address this challenge by introducing a method called clustering based feature selection (CB-FS) that employs a posterior analysis approach. It combines supervised machine learning (SML) and feature selection with Markov state models to automatically identify the relevant degrees of freedom that separate conformational states. We highlight the utility of the method in the evaluation of large-scale simulations and show that it can be used for the rapid and automated identification of relevant order parameters involved in the functional transitions of two exemplary cell-signaling proteins central to human disease states.

Journal of Chemical Theory and Computation | 2018

Transferable Neural Networks for Enhanced Sampling of Protein Dynamics

Mohammad M. Sultan; Hannah K. Wayment-Steele; Vijay S. Pande

Variational autoencoder frameworks have demonstrated success in reducing complex nonlinear dynamics in molecular simulation to a single nonlinear embedding. In this work, we illustrate how this nonlinear latent embedding can be used as a collective variable for enhanced sampling and present a simple modification that allows us to rapidly perform sampling in multiple related systems. We first demonstrate our method is able to describe the effects of force field changes in capped alanine dipeptide after learning about a model using AMBER99. We further provide a simple extension to variational dynamics encoders that allows the model to be trained in a more efficient manner on larger systems by encoding the outputs of a linear transformation using time-structure based independent component analysis (tICA). Using this technique, we show how such a model trained for one protein, the WW domain, can efficiently be transferred to perform enhanced sampling on a related mutant protein, the GTT mutation. This method shows promise for its ability to rapidly sample related systems using a single transferable collective variable, enabling us to probe the effects of variation in increasingly large systems of biophysical interest.

Journal of Social Structure | 2017

MSMExplorer: Data Visualizations for Biomolecular Dynamics

Carlos X. Hernández; Matthew P. Harrigan; Mohammad M. Sultan; Vijay S. Pande

MSMExplorer is a Python package for visualizing data generated from biomolecular dynamics. While molecular visualizations have been a large focus of the molecular dynamics (MD) community (Humphrey, Dalke, and Schulten 1996, Schrödinger, LLC (2015)), data visualizations for the analyses of MD trajectories have been less developed. MSMExplorer seeks to fill this niche by providing publication-quality statistical plots with an easy-to-use Python API that works seamlessly with commonly used Python libraries, such as numpy and scikit-learn (Walt, Colbert, and Varoquaux 2011, Pedregosa et al. (2011)). Additionally, plots are generated using already established plotting libraries, like seaborn, to provide a consistent aesthetic (Waskom et al. 2016, Hunter (2007), Hagberg, Schult, and Swart (2008), Foreman-Mackey (2016)).

Journal of Physical Chemistry B | 2017

Transfer Learning from Markov Models Leads to Efficient Sampling of Related Systems

Mohammad M. Sultan; Vijay S. Pande

We recently showed that the time-structure-based independent component analysis method from Markov state model literature provided a set of variationally optimal slow collective variables for metadynamics (tICA-metadynamics). In this paper, we extend the methodology toward efficient sampling of related mutants by borrowing ideas from transfer learning methods in machine learning. Our method explicitly assumes that a similar set of slow modes and metastable states is found in both the wild type (baseline) and its mutants. Under this assumption, we describe a few simple techniques using sequence mapping for transferring the slow modes and structural information contained in the wild type simulation to a mutant model for performing enhanced sampling. The resulting simulations can then be reweighted onto the full-phase space using the multistate Bennett acceptance ratio, allowing for thermodynamic comparison against the wild type. We first benchmark our methodology by recapturing alanine dipeptide dynamics across a range of different atomistic force fields, including the polarizable Amoeba force field, after learning a set of slow modes using Amber ff99sb-ILDN. We next extend the method by including structural data from the wild type simulation and apply the technique to recapturing the effects of the GTT mutation on the FIP35 WW domain.

Journal of Chemical Theory and Computation | 2017

A Minimum Variance Clustering Approach Produces Robust and Interpretable Coarse-Grained Models

Brooke E. Husic; Keri A. McKiernan; Hannah K. Wayment-Steele; Mohammad M. Sultan; Vijay S. Pande

Markov state models (MSMs) are a powerful framework for the analysis of molecular dynamics data sets, such as protein folding simulations, because of their straightforward construction and statistical rigor. The coarse-graining of MSMs into an interpretable number of macrostates is a crucial step for connecting theoretical results with experimental observables. Here we present the minimum variance clustering approach (MVCA) for the coarse-graining of MSMs into macrostate models. The method utilizes agglomerative clustering with Wards minimum variance objective function, and the similarity of the microstate dynamics is determined using the Jensen-Shannon divergence between the corresponding rows in the MSM transition probability matrix. We first show that MVCA produces intuitive results for a simple tripeptide system and is robust toward long-duration statistical artifacts. MVCA is then applied to two protein folding simulations of the same protein in different force fields to demonstrate that a different number of macrostates is appropriate for each model, revealing a misfolded state present in only one of the simulations. Finally, we show that the same method can be used to analyze a data set containing many MSMs from simulations in different force fields by aggregating them into groups and quantifying their dynamical similarity in the context of force field parameter choices. The minimum variance clustering approach with the Jensen-Shannon divergence provides a powerful tool to group dynamics by similarity, both among model states and among dynamical models themselves.

Explore More