Shiwei Lan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shiwei Lan is active.

Explore More

Publication

Featured researches published by Shiwei Lan.

Statistics and Computing | 2014

Split Hamiltonian Monte Carlo

Babak Shahbaba; Shiwei Lan; Wesley O. Johnson; Radford M. Neal

We show how the Hamiltonian Monte Carlo algorithm can sometimes be speeded up by “splitting” the Hamiltonian in a way that allows much of the movement around the state space to be done at low computational cost. One context where this is possible is when the log density of the distribution of interest (the potential energy function) can be written as the log of a Gaussian density, which is a quadratic function, plus a slowly-varying function. Hamiltonian dynamics for quadratic energy functions can be analytically solved. With the splitting technique, only the slowly-varying part of the energy needs to be handled numerically, and this can be done with a larger stepsize (and hence fewer steps) than would be necessary with a direct simulation of the dynamics. Another context where splitting helps is when the most important terms of the potential energy function and its gradient can be evaluated quickly, with only a slowly-varying part requiring costly computations. With splitting, the quick portion can be handled with a small stepsize, while the costly portion uses a larger stepsize. We show that both of these splitting approaches can reduce the computational cost of sampling from the posterior distribution for a logistic regression model, using either a Gaussian approximation centered on the posterior mode, or a Hamiltonian split into a term that depends on only a small number of critical cases, and another term that involves the larger number of cases whose influence on the posterior distribution is small.

Journal of Computational and Graphical Statistics | 2015

Markov Chain Monte Carlo from Lagrangian Dynamics.

Shiwei Lan; Vassilios Stathopoulos; Babak Shahbaba; Mark A. Girolami

Hamiltonian Monte Carlo (HMC) improves the computational efficiency of the Metropolis–Hastings algorithm by reducing its random walk behavior. Riemannian HMC (RHMC) further improves the performance of HMC by exploiting the geometric properties of the parameter space. However, the geometric integrator used for RHMC involves implicit equations that require fixed-point iterations. In some cases, the computational overhead for solving implicit equations undermines RHMC’s benefits. In an attempt to circumvent this problem, we propose an explicit integrator that replaces the momentum variable in RHMC by velocity. We show that the resulting transformation is equivalent to transforming Riemannian Hamiltonian dynamics to Lagrangian dynamics. Experimental results suggest that our method improves RHMC’s overall computational efficiency in the cases considered. All computer programs and datasets are available online (http://www.ics.uci.edu/babaks/Site/Codes.html) to allow replication of the results reported in this article.Hamiltonian Monte Carlo (HMC) improves the computational efficiency of the Metropolis algorithm by reducing its random walk behavior. Riemannian Manifold HMC (RMHMC) further improves HMCs performance by exploiting the geometric properties of the parameter space. However, the geometric integrator used for RMHMC involves implicit equations that require costly numerical analysis (e.g., fixed-point iteration). In some cases, the computational overhead for solving implicit equations undermines RMHMCs benefits. To avoid this problem, we propose an explicit geometric integrator that replaces the momentum variable in RMHMC by velocity. We show that the resulting transformation is equivalent to transforming Riemannian Hamilton dynamics to Lagrangian dynamics. Experimental results show that our method improves RMHMCs overall computational efficiency. All computer programs and data sets are available online (http://www.ics.uci.edu/~babaks/Site/Codes.html) in order to allow replications of the results reported in this paper.

Molecular Ecology Resources | 2017

phylodyn : an R package for phylodynamic simulation and inference

Michael D. Karcher; Julia A. Palacios; Shiwei Lan; Vladimir N. Minin

We introduce phylodyn, an r package for phylodynamic analysis based on gene genealogies. The packages main functionality is Bayesian nonparametric estimation of effective population size fluctuations over time. Our implementation includes several Markov chain Monte Carlo‐based methods and an integrated nested Laplace approximation‐based approach for phylodynamic inference that have been developed in recent years. Genealogical data describe the timed ancestral relationships of individuals sampled from a population of interest. Here, individuals are assumed to be sampled at the same point in time (isochronous sampling) or at different points in time (heterochronous sampling); in addition, sampling events can be modelled with preferential sampling, which means that the intensity of sampling events is allowed to depend on the effective population size trajectory. We assume the coalescent and the sequentially Markov coalescent processes as generative models of genealogies. We include several coalescent simulation functions that are useful for testing our phylodynamics methods via simulation studies. We compare the performance and outputs of various methods implemented in phylodyn and outline their strengths and weaknesses. r package phylodyn is available at https://github.com/mdkarcher/phylodyn.

Neural Computation | 2014

A Semiparametric Bayesian Model for Detecting Synchrony Among Multiple Neurons

Babak Shahbaba; Bo Zhou; Shiwei Lan; Hernando Ombao; David E. Moorman; Sam Behseta

We propose a scalable semiparametric Bayesian model to capture dependencies among multiple neurons by detecting their cofiring (possibly with some lag time) patterns over time. After discretizing time so there is at most one spike at each interval, the resulting sequence of 1s (spike) and 0s (silence) for each neuron is modeled using the logistic function of a continuous latent variable with a gaussian process prior. For multiple neurons, the corresponding marginal distributions are coupled to their joint probability distribution using a parametric copula model. The advantages of our approach are as follows. The nonparametric component (i.e., the gaussian process model) provides a flexible framework for modeling the underlying firing rates, and the parametric component (i.e., the copula model) allows us to make inferences regarding both contemporaneous and lagged relationships among neurons. Using the copula model, we construct multivariate probabilistic models by separating the modeling of univariate marginal distributions from the modeling of a dependence structure among variables. Our method is easy to implement using a computationally efficient sampling algorithm that can be easily extended to high-dimensional problems. Using simulated data, we show that our approach could correctly capture temporal dependencies in firing rates and identify synchronous neurons. We also apply our model to spike train data obtained from prefrontal cortical areas.

Bioinformatics | 2015

An efficient Bayesian inference framework for coalescent-based nonparametric phylodynamics.

Shiwei Lan; Julia A. Palacios; Michael D. Karcher; Vladimir N. Minin; Babak Shahbaba

MOTIVATION The field of phylodynamics focuses on the problem of reconstructing population size dynamics over time using current genetic samples taken from the population of interest. This technique has been extensively used in many areas of biology but is particularly useful for studying the spread of quickly evolving infectious diseases agents, e.g. influenza virus. Phylodynamic inference uses a coalescent model that defines a probability density for the genealogy of randomly sampled individuals from the population. When we assume that such a genealogy is known, the coalescent model, equipped with a Gaussian process prior on population size trajectory, allows for nonparametric Bayesian estimation of population size dynamics. Although this approach is quite powerful, large datasets collected during infectious disease surveillance challenge the state-of-the-art of Bayesian phylodynamics and demand inferential methods with relatively low computational cost. RESULTS To satisfy this demand, we provide a computationally efficient Bayesian inference framework based on Hamiltonian Monte Carlo for coalescent process models. Moreover, we show that by splitting the Hamiltonian function, we can further improve the efficiency of this approach. Using several simulated and real datasets, we show that our method provides accurate estimates of population size dynamics and is substantially faster than alternative methods based on elliptical slice sampler and Metropolis-adjusted Langevin algorithm. AVAILABILITY AND IMPLEMENTATION The R code for all simulation studies and real data analysis conducted in this article are publicly available at http://www.ics.uci.edu/∼slan/lanzi/CODES.html and in the R package phylodyn available at https://github.com/mdkarcher/phylodyn. CONTACT [email protected] or [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Journal of Statistical Computation and Simulation | 2018

Geodesic Lagrangian Monte Carlo over the space of positive definite matrices: with application to Bayesian spectral density estimation

Andrew Holbrook; Shiwei Lan; Alexander Vandenberg-Rodes; Babak Shahbaba

ABSTRACT We present geodesic Lagrangian Monte Carlo, an extension of Hamiltonian Monte Carlo for sampling from posterior distributions defined on general Riemannian manifolds. We apply this new algorithm to Bayesian inference on symmetric or Hermitian positive definite (PD) matrices. To do so, we exploit the Riemannian structure induced by Cartans canonical metric. The geodesics that correspond to this metric are available in closed-form and – within the context of Lagrangian Monte Carlo – provide a principled way to travel around the space of PD matrices. Our method improves Bayesian inference on such matrices by allowing for a broad range of priors, so we are not limited to conjugate priors only. In the context of spectral density estimation, we use the (non-conjugate) complex reference prior as an example modelling option made available by the algorithm. Results based on simulated and real-world multivariate time series are presented in this context, and future directions are outlined.

arXiv: Computation | 2016

Sampling Constrained Probability Distributions Using Spherical Augmentation

Shiwei Lan; Babak Shahbaba

Statistical models with constrained probability distributions are abundant in machine learning. Some examples include regression models with norm constraints (e.g., Lasso), probit, many copula models, and latent Dirichlet allocation (LDA). Bayesian inference involving probability distributions confined to constrained domains could be quite challenging for commonly used sampling algorithms. In this work, we propose a novel augmentation technique that handles a wide range of constraints by mapping the constrained domain to a sphere in the augmented space. By moving freely on the surface of this sphere, sampling algorithms handle constraints implicitly and generate proposals that remain within boundaries when mapped back to the original space. Our proposed method, called Spherical Augmentation, provides a mathematically natural and computationally efficient framework for sampling from constrained probability distributions . We show the advantages of our method over state-of-the-art sampling algorithms, such as exact Hamiltonian Monte Carlo, using several examples including truncated Gaussian distributions, Bayesian Lasso, Bayesian bridge regression, reconstruction of quantized stationary Gaussian process, and LDA for topic modeling.

international conference on machine learning | 2014