Thang D. Bui | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Thang D. Bui is active.

Explore More

Publication

Featured researches published by Thang D. Bui.

neural information processing systems | 2015

Learning stationary time series using Gaussian processes with nonparametric kernels

Felipe A. Tobar; Thang D. Bui; Richard E. Turner

We introduce the Gaussian Process Convolution Model (GPCM), a two-stage non-parametric generative procedure to model stationary signals as the convolution between a continuous-time white-noise process and a continuous-time linear filter drawn from Gaussian process. The GPCM is a continuous-time nonparametric-window moving average process and, conditionally, is itself a Gaussian process with a nonparametric kernel defined in a probabilistic fashion. The generative model can be equivalently considered in the frequency domain, where the power spectral density of the signal is specified using a Gaussian process. One of the main contributions of the paper is to develop a novel variational free-energy approach based on inter-domain inducing variables that efficiently learns the continuous-time linear filter and infers the driving white-noise process. In turn, this scheme provides closed-form probabilistic estimates of the covariance kernel and the noise-free signal both in denoising and prediction scenarios. Additionally, the variational inference procedure provides closed-form expressions for the approximate posterior of the spectral density given the observed data, leading to new Bayesian nonparametric approaches to spectrum estimation. The proposed GPCM is validated using synthetic and real-world signals.

arXiv: Machine Learning | 2018

Training Deep Gaussian Processes using Stochastic Expectation Propagation and Probabilistic Backpropagation

Thang D. Bui; José Miguel Hernández-Lobato; Yingzhen Li; Daniel Hernández-Lobato; Richard E. Turner

Deep Gaussian processes (DGPs) are multi-layer hierarchical generalisations of Gaussian processes (GPs) and are formally equivalent to neural networks with multiple, infinitely wide hidden layers. DGPs are probabilistic and non-parametric and as such are arguably more flexible, have a greater capacity to generalise, and provide better calibrated uncertainty estimates than alternative deep models. The focus of this paper is scalable approximate Bayesian learning of these networks. The paper develops a novel and efficient extension of probabilistic backpropagation, a state-of-the-art method for training Bayesian neural networks, that can be used to train DGPs. The new method leverages a recently proposed method for scaling Expectation Propagation, called stochastic Expectation Propagation. The method is able to automatically discover useful input warping, expansion or compression, and it is therefore is a flexible form of Bayesian kernel design. We demonstrate the success of the new method for supervised learning on several real-world datasets, showing that it typically outperforms GP regression and is never much worse.

neural information processing systems | 2017

Streaming sparse Gaussian process approximations

Thang D. Bui; Cuong V. Nguyen; Richard E. Turner

Sparse pseudo-point approximations for Gaussian process (GP) models provide a suite of methods that support deployment of GPs in the large data regime and enable analytic intractabilities to be sidestepped. However, the field lacks a principled method to handle streaming data in which both the posterior distribution over function values and the hyperparameter estimates are updated in an online fashion. The small number of existing approaches either use suboptimal hand-crafted heuristics for hyperparameter learning, or suffer from catastrophic forgetting or slow updating when new data arrive. This paper develops a new principled framework for deploying Gaussian process probabilistic models in the streaming setting, providing methods for learning hyperparameters and optimising pseudo-input locations. The proposed framework is assessed using synthetic and real-world datasets.

international conference on machine learning | 2016

Black-box α-divergence minimization

José Miguel Hernández-Lobato; Yingzhen Li; Mark Rowland; Daniel Hernández-Lobato; Thang D. Bui; Richard E. Turner

Black-box alpha (BB-α) is a new approximate inference method based on the minimization of α-divergences. BB-α scales to large datasets because it can be implemented using stochastic gradient descent. BB-α can be applied to complex probabilistic models with little effort since it only requires as input the likelihood function and its gradients. These gradients can be easily obtained using automatic differentiation. By changing the divergence parameter α, the method is able to interpolate between variational Bayes (VB) (α → 0) and an algorithm similar to expectation propagation (EP) (α = 1). Experiments on probit regression and neural network regression and classification problems show that BB-α with non-standard settings of α, such as α = 0:5, usually produces better predictions than with α → 0 (VB) or α = 1 (EP).

web search and data mining | 2018

Neural Graph Learning: Training Neural Networks Using Graphs

Thang D. Bui; Sujith Ravi; Vivek Ramavajjala

Label propagation is a powerful and flexible semi-supervised learning technique on graphs. Neural networks, on the other hand, have proven track records in many supervised learning tasks. In this work, we propose a training framework with a graph-regularised objective, namely Neural Graph Machines, that can combine the power of neural networks and label propagation. This work generalises previous literature on graph-augmented training of neural networks, enabling it to be applied to multiple neural architectures (Feed-forward NNs, CNNs and LSTM RNNs) and a wide range of graphs. The new objective allows the neural networks to harness both labeled and unlabeled data by: (a)~allowing the network to train using labeled data as in the supervised setting, (b)~biasing the network to learn similar hidden representations for neighboring nodes on a graph, in the same vein as label propagation. Such architectures with the proposed objective can be trained efficiently using stochastic gradient descent and scaled to large graphs, with a runtime that is linear in the number of edges. The proposed joint training approach convincingly outperforms many existing methods on a wide range of tasks (multi-label classification on social graphs, news categorization, document classification and semantic intent classification), with multiple forms of graph inputs (including graphs with and without node-level features) and using different types of neural networks.

arXiv: Machine Learning | 2018

Stochastic Expectation Propagation for Large Scale Gaussian Process Classification

Daniel Hernández-Lobato; José Miguel Hernández-Lobato; Yingzhen Li; Thang D. Bui; Richard E. Turner

A method for large scale Gaussian process classification has been recently proposed based on expectation propagation (EP). Such a method allows Gaussian process classifiers to be trained on very large datasets that were out of the reach of previous deployments of EP and has been shown to be competitive with related techniques based on stochastic variational inference. Nevertheless, the memory resources required scale linearly with the dataset size, unlike in variational methods. This is a severe limitation when the number of instances is very large. Here we show that this problem is avoided when stochastic EP is used to train the model.

Archive | 2018

Efficient deterministic approximate Bayesian inference for Gaussian process models

Thang D. Bui

Gaussian processes are powerful nonparametric distributions over continuous functions that have become a standard tool in modern probabilistic machine learning. However, the applicability of Gaussian processes in the large-data regime and in hierarchical probabilistic models is severely limited by analytic and computational intractabilities. It is, therefore, important to develop practical approximate inference and learning algorithms that can address these challenges. To this end, this dissertation provides a comprehensive and unifying perspective of pseudo-point based deterministic approximate Bayesian learning for a wide variety of Gaussian process models, which connects previously disparate literature, greatly extends them and allows new state-of-the-art approximations to emerge. We start by building a posterior approximation framework based on Power-Expectation Propagation for Gaussian process regression and classification. This framework relies on a structured approximate Gaussian process posterior based on a small number of pseudo-points, which is judiciously chosen to summarise the actual data and enable tractable and efficient inference and hyperparameter learning. Many existing sparse approximations are recovered as special cases of this framework, and can now be understood as performing approximate posterior inference using a common approximate posterior. Critically, extensive empirical evidence suggests that new approximation methods arisen from this unifying perspective outperform existing approaches in many real-world regression and classification tasks. We explore the extensions of this framework to Gaussian process state space models, Gaussian process latent variable models and deep Gaussian processes, which also unify many recently developed approximation schemes for these models. Several mean-field and structured approximate posterior families for the hidden variables in these models are studied. We also discuss several methods for approximate uncertainty propagation in recurrent and deep architectures based on Gaussian projection, linearisation, and simple Monte Carlo. The benefit of the unified inference and learning frameworks for these models are illustrated in a variety of real-world state-space modelling and regression tasks.

international conference on machine learning | 2016