[PDF] A Survey of Bayesian Statistical Approaches for Big Data

Abstract

The modern era is characterised as an era of information or Big Data. This has motivated a huge literature on new methods for extracting information and insights from these data. A natural question is how these approaches differ from those that were available prior to the advent of Big Data. We present a review of published studies that present Bayesian statistical approaches specifically for Big Data and discuss the reported and perceived benefits of these approaches. We conclude by addressing the question of whether focusing only on improving computational algorithms and infrastructure will be enough to face the challenges of Big Data.

Full PDF

AA R

EVIEW OF B AYESIAN S TATISTICAL A PPROACHES FOR B IG D ATA

A P

REPRINT

Farzana Jahan ∗ School of Mathematical SciencesARC Centre of Mathematical and Statistical FrontiersScience and Engineering FacultyQueensland University of TechnologyBrisbane, Queensland, Australia [email protected]

Insha Ullah

School of Mathematical SciencesARC Centre of Mathematical and Statistical FrontiersScience and Engineering FacultyQueensland University of TechnologyBrisbane, Queensland, Australia [email protected]

Kerrie L. Mengersen

School of Mathematical SciencesARC Centre of Mathematical and Statistical FrontiersScience and Engineering FacultyQueensland University of TechnologyBrisbane, Queensland, Australia k.mengersen @qut.edu.au

June 9, 2020 A BSTRACT

The modern era is characterised as an era of information or Big Data. This has motivated a hugeliterature on new methods for extracting information and insights from these data. A natural questionis how these approaches differ from those that were available prior to the advent of Big Data. Wepresent a review of published studies that present Bayesian statistical approaches speciﬁcally for BigData and discuss the reported and perceived beneﬁts of these approaches. We conclude by addressingthe question of whether focusing only on improving computational algorithms and infrastructure willbe enough to face the challenges of Big Data. K eywords Bayesian Statistics · Bayesian modelling · Bayesian computation · Scalable algorithms

Although there are many variations on the deﬁnition of Big Data [1, 2, 3, 4], it is clear that it encompasses largeand often diverse quantitative data obtained from increasing numerous sources at different individual, spatial andtemporal scales, and with different levels of quality. Examples of Big Data include data generated from social media[5]; data collected in biomedical and healthcare informatics research such as DNA sequences and electronic healthrecords [6]; geospatial data generated by remote sensing, laser scanning, mobile mapping, geo-located sensors, geo-tagged web contents, volunteered geographic information (VGI), global navigation satellite system (GNSS) trackingand so on [7]. The volume and complexity of Big Data often exceeds the capability of the standard analytics tools(software, hardware, methods and algorithms) [8, 9]. The concomitant challenges of managing, modelling, analysingand interpreting these data have motivated a large literature on potential solutions from a range of domains includingstatistics, machine learning and computer science. This literature can be grouped into four broad categories of articles.The ﬁrst includes general articles about the concept of Big Data, including the features and challenges, and theirapplication and importance in speciﬁc ﬁelds. The second includes literature concentrating on infrastructure and ∗ Corresponding author a r X i v : . [ s t a t . C O ] J un P REPRINT J UNE

9, 2020management, including parallel computing and specialised software. The third focuses on statistical and machinelearning models and algorithms for Big Data. The ﬁnal category includes articles on the application of these newtechniques to complex real-world problems.In this chapter, we classify the literature published on Big Data into ﬁner classes than the four broad categoriesmentioned earlier and brieﬂy reviewed the contents covered by those different categories. But the main focus of thechapter is around the third category, in particular on statistical contributions to Big Data. We examine the nature ofthese innovations and attempt to catalogue them as modelling, algorithmic or other contributions. We then drill furtherinto this set and examine the more speciﬁc literature on Bayesian approaches. Although there is an increasing interest inthis paradigm from a wide range of perspectives including statistics, machine learning, information science, computerscience and the various application areas, to our knowledge there has not yet been a review of Bayesian statisticalapproaches for Big Data. This is the primary contribution of this chapter.This chapter provides a review of the published studies that present Bayesian statistical models speciﬁcally for BigData and discusses the reported and perceived beneﬁts of these approaches. We conclude by addressing the question ofwhether focusing only on improving computational algorithms and infrastructure will be enough to face the challengesof Big Data.The chapter proceeds as follows. In the next section, literature search and inclusion criteria for this chapter is outlined.A classiﬁcation of Big Data literature along with brief review of relevant literature in each class is presented in section3. Section 4 consists of a brief review of articles discussing Big Data problems from statistical perspectives, followedby a review of Bayesian approaches applied to Big Data. The ﬁnal section includes a discussion of this review with aview to answering the research question posed above.

The literature search for this review paper was undertaken using different methods. The search methods implemented toﬁnd the relevant literature and the criteria for the inclusion of the literature in this chapter are brieﬂy discussed in thissection.

Acknowledging the fact that there has been a wide range of literature on Big Data, the speciﬁc focus in this chapter wason recent developments published in the last 5 years, 2013-2019.For quality assurance reasons, of the literature only peer reviewed published articles, book chapters and conferenceproceedings were included in the chapter. Some articles were also included from arXiv and pre-print versions for thoseto be soon published and from well known researchers working in that particular area of interest.

The database “Scopus" was used to initiate the literature search. To identify the availability ofliterature and broadly learn about the broad areas of concentration, the following keywords were used: Big Data, BigData Analysis, Big Data Analytics, Statistics and Big Data.The huge range of literature obtained by this initial search was complemented by a search of “Google Scholar" usingmore speciﬁc key words as follows: Features and Challenges of Big Data, Big Data Infrastructure, Big Data andMachine Learning, Big Data and Cloud Computing, Statistical approaches/methods/models in Big Data, BayesianApproaches/Methods/Models in Big Data, Big Data analysis using Bayesian Statistics, Bayesian Big Data, BayesianStatistics and Big Data.

Expert Knowledge:

In addition to the literature found by the above Database search, we used expert knowledge andopinions in the ﬁeld and reviewed the works of well known researchers in the ﬁeld of Bayesian Statistics for theirresearch works related to Bayesian approaches to Big Data and included the relevant publications for review in thischapter.

Scanning References of selected literature:

Further studies and literature were found by searching the references ofselected literature.

Searching with speciﬁc keywords:

Since the focus of this chapter is to review the Bayesian approaches to Big Data,more literature was sourced by using speciﬁc Bayesian methods or approaches found to be applied to Big Data:Approximate Bayesian Computation and Big Data, Bayesian Networks in Big Data, Classiﬁcation and regression2 P

REPRINT J UNE

9, 2020trees/Bayesian Additive regression trees in Big Data, Naive Bayes Classiﬁers and Big Data, Sequential Monte Carloand Big Data, Hamiltonian Monte Carlo and Big Data, Variational Bayes and Big Data, Bayesian Empirical Likelihoodand Big Data, Bayesian Spatial modelling and Big Data, Non parametric Bayes and Big Data.This last step was conducted in order to ensure that this chapter covers the important and emerging areas of BayesianStatistics and their application to Big Data. These searches were conducted in “Google Scholar" and up to 30 pages ofresults were considered in order to ﬁnd relevant literature.

The published articles on Big Data can be divided into ﬁner classes than the four main categories described above.Of course, there are many ways to make these delineations. Table 1 shows one such delineation, with representativereferences from the last ﬁve years of published literature. The aim of this table is to indicate the wide ranging literatureon Big Data and provide relevant references in different categories for interested readers.Figure 1: Classiﬁcation of Big Data literatureTable 1: Classes of Big Data LiteratureTopic Representative ReferencesFeatures and Challenges [10, 11, 12, 2, 3, 4, 9, 13, 14, 15].Infrastructure [16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28].Cloud computing [29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39].Applications (3 exam-ples) Social science: [40, 41, 5, 42, 43, 44].Health/medicine/medical science: [45, 46, 47, 48, 49, 50,51, 52, 53, 54, 55, 56, 57, 58, 59].Business: [60, 61, 62, 63, 64, 65, 66, 67].Machine Learning Meth-ods [68, 69, 70, 71, 72, 73, 74, 75, 76, 77].Statistical Methods [78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92,93, 94, 95, 96].3 P

REPRINT J UNE

9, 2020Bayesian Methods [97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108,109, 110, 111, 112, 113, 114].The links between these classes of literature can be visualised as in Figure 1 and a brief description of each of the classesand the contents covered by the relevant references listed are provided in Table 2. The brief reviews presented in Table2 can be helpful for interested readers to develop a broad idea about each of the classes mentioned in Table 1. However,Table 2 does not include brief reviews of the last two classes, namely, Statistical Methods and Bayesian Methods, sincethese classes are discussed in detail in sections 4 and 5. We would like to acknowledge the fact that Bayesian methodsare essentially part of statistical methods, but in this chapter, the distinct classes are made intentionally to be able toidentify and discuss the speciﬁc developments in Bayesian approaches.Table 2: Brief Review of relevant literature under identiﬁed classesFeatures and Challenges • The general features of Big Data are volume, variety, velocity, veracity,value [15, 3] and some salient features include massive sample sizes andhigh dimensionality [15]. • Many challenges of Big Data regarding storage, processing, analysis andprivacy are identiﬁed in the literature [3, 12, 15, 14, 13].Infrastructure • To manage and analyse Big Data, infrastructural support is needed suchas sufﬁcient storage technologies and data management systems. Theseare being continuously developed and improved. MangoDB, Terrastoreand RethhinkDb are some examples of the storage technologies; more onevolution technologies with their strengths, weaknesses, opportunities andthreats are available in [19]. • To analyse Big Data, parallel processing systems and scalable algorithmsare needed. MapReduce is one of the pioneering data processing systems[22]. Some other useful and popular tools to handle Big Data are Apache,Hadoop, Spark [21].Cloud computing • Cloud computing, the practice of using a network of remote servers hostedon the Internet rather than a local server or a personal computer, plays akey role in Big Data analysis by providing required infrastructure neededto store, analyse, visualise and model Big Data using scalable and adaptivesystems [33]. • Opportunities and challenges of cloud computing technologies, futuretrends and application areas are widely discussed in the literature [32, 35,39] and new developments on cloud computing are proposed to overcomeknown challenges, such as collaborative anomaly detection [30], hybridapproach for scalable sub-tree anonymisation using MapReduce on cloud[36] etc.Applications (3 examples) 4 P

REPRINT J UNE

9, 2020 • Big Data has made it possible to analyse social behaviour and an indi-vidual’s interactions with social systems based on social media usage[40, 43, 44]. Discussions on challenges and future of social science re-search using Big Data have been made in the literature [43, 41]. • Research involving Big Data in medicine, public health, biomedical andhealth informatics has increased exponentially over the last decade [46, 56,57, 52, 6, 55]. Some examples include infectious disease research [48, 59],developing personalised medicine and health care [54, 50] and improvingcardiovascular care [49]. • Analysis of Big Data is used to solve many real world problems in busi-ness, in particular, using Big Data analytics for innovations in leadingorganisations [67], predictive analytics in retail [64], analysis of businessrisks and beneﬁts [60], development of market strategies [61] and so on.The opportunities and challenges of Big Data in e-commerce and Big Dataintegration in business processes can be found in the review articles by[65] and [4].Machine Learning Methods • Machine learning is an interdisciplinary ﬁeld of research primarily focus-ing on theory, performance, properties of learning systems and algorithms[115]. Traditional machine learning is evolving to tackle the additionalchallenges of Big Data [115, 72]. • Some examples of developments in machine learning theories and algo-rithms for Big Data include high performance machine learning toolbox[71], scalable machine learning online services for Big Data real timeanalysis [116]. • There is a large and increasing research on speciﬁc applications of machinelearning tools for Big Data in different disciplines. For example, [70]discussed the future of Big Data and machine learning in clinical medicine;[117] discussed a classiﬁer speciﬁcally for medical Big Data and [69]reviewed the state of art and future prospects of machine learning and BigData in radiation oncology.

The importance of modelling and theoretical considerations for analysing Big Data are well stated in the literature[87, 91]. These authors pointed out that blind trust in algorithms without proper theoretical considerations will not resultin valid outputs. The emerging challenges of Big Data are beyond the issues of processing, storing and management.The choice of suitable statistical methods is crucial in order to make the most of the Big Data [85, 95]. [78] highlightedthe role of statistical methods for interpretability, uncertainty quantiﬁcation, reducing selection bias in analysing BigData.In this section we present a brief review of some of the published research on statistical perspectives, methods, modelsand algorithms that are targeted to Big Data. As above, the review is conﬁned to the last ﬁve years, commencing withthe most recent contributions. Bayesian approaches are reserved for the next section.5 P

REPRINT J UNE

9, 2020Table 3: Classiﬁcation of statistical literature to Big DataTopic: Discussion ArticleAuthor: Dunson (2018) [78] • Discussed the background of big data from the perspectives of the machinelearning and statistics communities. • Listed the differences in the methods and inferences as replicability, un-certainty quantiﬁcation, sampling, selection bias and measurement errordrawn from statistical perspectives to those of machine learning. • Identiﬁed the statistical challenges for high dimensional complex data(big data) in quantifying uncertainty, scaling up sampling methods andselection of priors in Bayesian methods.Topic: ReviewAuthor: Nongxa (2017) [79] • Identiﬁed challenges of big data as: high dimensionality, heterogeneityand incompleteness, scale, timeliness, security and privacy. • Pointed out that mathematical and statistical challenges of big data re-quire updating the core knowledge areas (i.e., linear algebra, multivariablecalculus, elementary probability and statistics, coding or programming)to more advanced topics (i.e., randomised numerical linear algebra, topo-logical data analysis, matrix and tensor decompositions, random graphs;random matrices and complex networks ) in mathematical and statisticaleducation.Author: Franke et al.(2016) [85] • Reviewed different strategies of analysis as: data wrangling, visualisation,dimension reduction, sparsity regularisation, optimisation, measuring dis-tance, representation learning, sequential learning and provided detailedexamples of applications.Author: Chen et al. (2015) [92] • Emphasised the importance of statistical knowledge and skills in Big DataAnalytics using several examples. • Discussed some statistical methods that are useful in the context ofbig data as: conﬁrmatory and exploratory data analysis tools, datamining methods including supervised learning (classiﬁcation, regres-sion/prediction) and unsupervised learning (cluster analysis, anomalydetection, association rule learning), visualisation techniques etc. • Elaborated on the computational skills needed for statisticians in dataacquisition, data processing, data management and data analysis.Author: Hoerl et al. (2014) [95] • Provided a background of big data reviewing relevant articles. • Discussed the importance of statistical thinking in big data problemsreviewing some misleading results produced by sophisticated analysis ofbig data without involving statistical principles. • Elaborated on the roles of statistical thinking for data quality, domainknowledge, analysis strategies in order to solve complex unstructuredproblems involving big data.Topic: Review of methods & extension 6 P

REPRINT J UNE

9, 2020Author: Wang et al. (2016) [88] • Reviewed statistical methods and software packages in R and recently de-veloped tools to handle Big Data, focusing on three groups: sub-sampling,divide and conquer and online processing. • Extended the online updating approach by employing variable selectioncriteria.Topic: Methods review, new methodsAuthor: Genuer et al. (2017) [118] • Reviewed proposals dealing with scaling random forests to big data prob-lems. • Discussed subsampling, parallel implementations, online processing ofrandom forests in detail. • Proposed ﬁve variants of Random Forests for big data.Author: Wang and Xu (2015) [119] • Reviewed different clustering methods applicable to big data situations. • Proposed a clustering procedure with adaptive density peak detection ap-plying multivariate kernel estimation and demonstrated the performancethrough simulation studies and analysis of a few benchmark gene expres-sion data sets. • Developed a R-package “ADPclust” to implement the proposed methods.Author: Wang et al. (2017) [81] • Proposed a method and algorithm for online updating implementing biascorrections with extensions for application in a generalised linear model(GLM) setting. • Evaluated the proposed strategies in comparison with previous algorithms[86].Topic: New methods and algorithmsAuthor: Liu et al. (2017) [83] • Proposed a novel sparse GLM with L0 approximation for feature selectionand prediction in big omics data scenarios. • Provided novel algorithm and software in MATLAB (L0ADRIDGE) forperforming L0 penalised GLM in ultra high dimensional big data. • Comparison of performance with other methods (SCAD, MC+) usingsimulation and real data analysis (mRNA,microRNA, methylation datafrom TGCA ovarian cancer).Author: Schifano et al. (2016) [86] • Developed new statistical methods and iterative algorithms for analysingstreaming data. • Proposed methods to enable update of the estimations and models withthe arrival of new data. 7 P

REPRINT J UNE

9, 2020Author: Allen et al. (2014) [120] • Proposed generalisations to Principal Components Analysis (PCA) to takeinto account structural relationships in big data settings. • Developed fast computational algorithms using the proposed methods(GPCA, sparse GPCA and functional GPCA) for massive data sets.Topic: New algorithmsAuthor: Wang and Samworth (2017) [82] • Proposed a new algorithm "inspect" (informative sparse projection forestimation of change points) to estimate the number and location of changepoints in high dimensional time series. • The algorithm, starting from a simple time series model, was extendedto detect multiple change points and was also extended to have spatialor temporal dependence, assessed using simulation studies and real dataapplication.Author: Yu and Lee (2017) [121] • Extended the alternating direction method of multipliers (ADMM) tosolve penalised quantile regression problems involving massive data setshaving faster computation and no loss of estimation accuracy.Author: Zhang and Yang (2017) [84] • Proposed new algorithms using ridge regression to make it efﬁcient forhandling big data.Author: Doonrik and Hendry (2015) [89] • Discussed the statistical model selection algorithm "autometrics" foreconometric data [122] with its application to fat big data (having largernumber of variables than the number of observations) . • Extended algorithms for tackling computational issues of fat big dataapplying block searches and re-selection by lasso for correlated regressors.Author: Sysoev et al. (2014) [93] • Presented efﬁcient algorithms to estimate bootstrap or jackknife typeconﬁdence intervals for ﬁtted big data sets by Multivariate MonotonicRegression. • Evaluated the performance of the proposed algorithms using a case studyon death in coronary heart disease for a large population.Author: Pehlivanl(2015) [90] • Proposed a novel approach for feature selection from high dimensionaldata. • Tested the efﬁciency of the proposed method using sensitivity, speciﬁcity,accuracy and ROC curve. • Demonstrated the approach on micro-array data.Among the brief reviews of the relevant literature in Table 3, we include detailed reviews of three papers which aremore generic in explaining the role of statistics and statistical methods in Big Data along with recent developments inthis area.[88] summarised the published literature on recent methodological developments for Big Data in three broad groups:subsampling, which calculates a statistic in many subsamples taken from the data and then combining the results [123];8 P

REPRINT J UNE

9, 2020divide and conquer, the principle of which is to break a dataset into smaller subsets to analyse these in parallel andcombine the results at the end [124] ; and online updating of streaming data [86], based on online recursive analyticalprocessing. He summarised the following methods in the ﬁrst two groups: subsampling based methods (bag of littlebootstraps, leveraging, mean log likelihood, subsample based MCMC), divide and conquer (aggregated estimatingequations, majority voting, screening with ultra high dimension, parallel MCMC). The authors, after reviewing existingonline updating methods and algorithms, extended the online updating of stream data method by including criterionbased variable selection with online updating. The authors also discussed the available software packages (open sourceR as well as commercial software) developed to handle computational complexity involving Big Data. For breakingthe memory barrier using R, the authors cited and discussed several data management packages (sqldf, DBI, RSQLite,ﬁlehash, bigmemory, ff) and packages for numerical calculation (speedglm, biglm, biganalytics, ffbase, bigtabulate,bigalgebra, bigpca, bigrf, biglars, PopGenome). The R packages for breaking computing power were cited and discussedin two groups: packages for speeding up (compiler, inline, Rcpp, RcpEigen, RcppArmadilo, RInside, microbenchmark,proftools, aprof, lineprof, GUIproﬁler) and packages for scaling up (Rmpi, snow, snowFT, snowfall, multicore, parallel,foreach, Rdsm, bigmemory, pdpMPI, pbdSLAP, pbdBASE, pbdMAT, pbdDEMO, Rhipe, segue, rhbase, rhdfs, rmr,plymr, ravroSparkR, pnmath, pnmath0, rsprng, rlecuyer, doRNG, gputools, bigvis). The authors also discussed thedevelopments in Hadoop, Spark, OpenMP, API and using FORTRAN and C++ from R in order to create ﬂexibleprograms for handling Big Data. The article also presented a brief summary about the commercial statistical software,e.g., SAS, SPSS, MATLAB. The study included a case study of ﬁtting a logistic model to a massive data set on airlineon-time performance data from the 2009 ASA Data Expo mentioning the use of some R packages discussed earlier tohandle the problem with memory and computational capacity. Overall, this study provided a comprehensive review anddiscussion of state-of-the-art statistical methodologies and software development for handling Big Data.[92] presented their views on the challenges and importance of Big Data and explained the role of statistics in BigData Analytics based on a review of relevant literature. This study emphasised the importance of statistical knowledgeand skills in Big Data Analytics using several examples. As detailed in Table 3, the authors broadly discussed a rangeof statistical methods which can be really helpful in better analysis of Big Data, such as, the use of exploratory dataanalysis principle in Statistics to investigate correlations among the variables in the data or establish causal relationshipsbetween response and explanatory variables in the Big Data. The authors speciﬁcally mentioned hypothesis testing,predictive analysis using statistical models, statistical inference using uncertainty estimation to be some key tools to usein Big Data analysis. The authors also explained that the combination of statistical knowledge can be combined withthe Data mining methods such as unsupervised learning (cluster analysis, Association rule learning, anomaly detection)and supervised learning (regression and classiﬁcation) can be beneﬁcial for Big Data analysis. The challenges for thestatisticians in coping with Big Data were also described in this article, with particular emphasis on computational skillsin data acquisition (knowledge of programming languages, knowledge of web and core communication protocols), dataprocessing (skills to transform voice or image data to numeric data using appropriate software or programming), datamanagement (knowledge about database management tools and technologies, such as NoSQL) and scalable computation(knowledge about parallel computing, which can be implemented using MapReduce, SQL etc.).As indicated above, many of the papers provide a summary of the published literature which is not replicated here.Some of these reviews are based on large thematic programs that have been held on this topic. For example, the paperby [85] is based on presentations and discussions held as part of the program on Statistical Inference, Learning andModels for Big Data which was held in Canada in 2015. The authors discussed the four V’s (volume, variety, veracityand velocity) of Big Data and mentioned some more challenges in Big Data analysis which are beyond the complexitiesassociated with the four V’s. The additional “V" mentioned in this article is veracity. Veracity refers to biases andnoise in the data which may be the result of the heterogeneous structure of the data sources, which may make thesample non representative of the population. Veracity in Big Data is often referred to as the biggest challenge comparedwith the other V’s. The paper reviewed the common strategies for Big Data analysis starting from data wranglingwhich consists of data manipulation techniques for making the data eligible for analysis; visualisation which is often animportant tool to understand the underlying patterns in the data and is the ﬁrst formal step in data analysis; reducing thedimension of data using different algorithms such as Principal Component Analysis (PCA) to make Big Data modelstractable and interpretable; making models more robust by enforcing sparsity in the model by the use of regularisationtechniques such as variable selection and model ﬁtting criteria; using optimisation methods based on different distancemeasures proposed for high dimensional data and by using different learning algorithms such as representation learningand sequential learning. Different applications of Big Data were shown in public health, health policy, law and order,education, mobile application security, image recognition and labelling, digital humanities and materials science.There are few other research articles focused on statistical methods tailored to speciﬁc problems, which are not includedin Table 2. For example, [125] proposed a statistics-based algorithm using a stochastic space-time model with more than1 billion data points to reproduce some features of a climate model. Similarly, [126] used various statistical methodsto obtain associations between drug-outcome pairs in a very big longitudinal medical experimental database (with9 P

REPRINT J UNE

9, 2020information on millions of patients) with a detailed discussion on the big results problem by providing a comparisonof statistical and machine learning approaches. Finally, [96] proposed stochastic variational inference for Gaussianprocesses which makes the application of Gaussian process to huge data sets (having millions of data points).From the review of some relevant literature related to statistical perspectives for analysing Big Data, it can be seen thatalong with scaling up existing algorithms, new methodological developments are also in progress in order to face thechallenges associated with Big Data.

As described in the Introduction, the intention of this review is to commence with a broad scope of the literature onBig Data, then focus on statistical methods for Big Data, and ﬁnally to focus in particular on Bayesian approaches formodelling and analysis of Big Data. This section consists of a review of published literature on the last of these.There are two deﬁning features of Bayesian analysis: (i) the construction of the model and associated parameters andexpectations of interest, and (ii) the development of an algorithm to obtain posterior estimates of these quantities. Inthe context of Big Data, the resultant models can become complex and suffer from issues such as unavailability of alikelihood, hierarchical instability, parameter explosion and identiﬁability. Similarly, the algorithms can suffer from toomuch or too little data given the model structure, as well as problems of scalability and cost. These issues have motivatedthe development of new model structures, new methods that avoid the need for models, new Markov chain MonteCarlo (MCMC) sampling methods, and alternative algorithms and approximations that avoid these simulation-basedapproaches. We discuss some of the concomitant literature under two broad headings, namely computation and modelsrealising that there is often overlap in cited papers.

In Bayesian framework a main-stream computational tool has been the Markov chain Monte Carlo (MCMC). Thetraditional MCMC methods do not scale well because they need to iterate through the full data set at each iteration toevaluate the likelihood [100]. Recently several attempts have been made to scale MCMC methods up to massive data.A widely used strategy to overcome the computational cost is to distribute the computational burden across a number ofmachines. The strategy is generally referred to as divide-and-conquer sampling. This approach breaks a massive dataset into a number of easier to handle subsets, obtains posterior samples based on each subset in parallel using multiplemachines and ﬁnally combines the subset posterior inferences to obtain the full-posterior estimates [124]. The corechallenge is the recombination of sub-posterior samples to obtain true posterior samples. A number of attempts havebeen made to address this challenge.[127] and [128] approximated the sub-posteriors using kernel density estimation and then aggregated the sub-posteriorsby taking their product. Both algorithms provided consistent estimates of the posterior. [127] provided faster MCMCprocessing since it allowed the machine to process the parallel MCMC chains independently. However, one limitationof the asymptotically embarrassing parallel MCMC algorithm [127] is that it only works for real and unconstrainedposterior values, so there is still scope of works to make the algorithm work under more general settings.[129] adopted a similar approach of parallel MCMC but used a Weierstrass transform to approximate the sub-posteriordensities instead of a kernel density estimate. This provided better approximation accuracy, chain mixing rate andpotentially faster speed for large scale Bayesian analysis.[106] partitioned the data at random and performed MCMC independently on each subset to draw samples fromposterior given the data subset. To obtain consensus posteriors they proposed to average samples across subsets andshowed the exactness of the algorithm under a Gaussian assumption. This algorithm is scalable to a very large numberof machines and works in cluster, single multi core or multiprocessor computers or any arbitrary collection of computerslinked by a high speed network. The key weakness of consesnsous MCMC is it does not apply to non Gaussianposterior.[102] proposed dividing a large set of independent data into a number of non-overlapping subsets, making inferenceson the subsets in parallel and then combining the inferences using the median of the subset posteriors. The medianposterior (M-posterior) is constructed from the subset posteriors using Weiszfeld’s algorithm, which provides a scalablealgortihm for robust estimation .[130] extended this notion to spatially dependent data, provided a scalable divide and conquer algorithm to analyse bigspatial data sets named spatial meta kriging. The multivariate extension of spatial meta kriging has been addressedby [131]. These approaches of meta kriging are practical developments for Bayesian spatial inference for Big Data,speciﬁcally with “big-N" problems [132]. 10 P

REPRINT J UNE

9, 2020[100] proposed a new and ﬂexible divide and conquer framework by using re-scaled sub-posteriors to approximate theoverall posterior. Unlike other parallel approaches of MCMC, this method creates artiﬁcial data for each subset, andapplies the overall priors on the artiﬁcial data sets to get the subset posteriors. The sub-posteriors are then re-centred totheir common mean and then averaged to approximate the overall posterior. The authors claimed this method to havestatistical justiﬁcation as well as mathematical validity along with sharing same computational cost with other classicalparallel MCMC approaches such as consensus Monte Carlo, Weierstrass sampler. [133] proposed a non-reversiblerejection-free MCMC method, which reportedly outperforms state-of-the-art methods such as: HMC, Fireﬂy by havingfaster mixing rate and lower variances for the estimators for high dimensional models and large data sets. However, theautomation of this method is still a challenge.Another strategy for scalable Bayesian inference is the sub-sampling based approach. In this approach, a smallersubset of data is queried in the MCMC algorithm to evaluate the likelihood at every iteration. [134] proposed to use anauxiliary variable MCMC algorithm that evaluates the likelihood based on a small subset of the data at each iterationyet simulates from the exact posterior distribution. To improve the mixing speed, [135] used an approximate MetropolisHastings (MH) test based on a subset of data. A similar approach is used in [136], where the accept/reject step ofMH evaluates the likelihood of a random subset of the data. [137] extended this approach by replacing a number oflikelihood evaluations by a Taylor expansion centred at the maximum of the likelihood and concluded that their methodoutperforms the previous algorithms [135].The scalable MCMC approach was also improved by [138] using a difference estimator to estimate the log of thelikelihood accurately using only a small fraction of the data. [98] introduced an unbiased estimator of the log likelihoodbased on weighted sub-sample which is used in the MH acceptance step in speeding up based on a weighted MCMCefﬁciently. Another scalable adaptation of MH algorithm was proposed by [139] to speed up Bayesian inference inBig Data namely informed subsampling MCMC which involves drawing of subsets according to a similarity measure(i.e., squared L2 distance between full data and maximum likelihood estimators of subsample) instead of usinguniform distribution. The algorithm showed excellent performance in the case of a limited computational budget byapproximating the posterior for a tall dataset.Another variation of MCMC in Big Data has been made by [111]. These authors approximated the posterior expectationby a novel Bayesian inference framework for approximating the posterior expectation from a different perspectivesuitable for Big Data problems, which involves paths of partial posteriors. This is a parallelisable method which caneasily be implemented using existing MCMC techniques. It does not require the simulation from full posterior, thusbypassing the complex convergence issues of kernel approximation. However, there is still scope for future work tolook at computation-variance trade off and ﬁnite time bias produced by MCMC.Hamiltonian Monte Carlo (HMC) sampling methods provide powerful and efﬁcient algorithms for MCMC using highacceptance probabilities for distant proposals [94]. A conceptual introduction to HMC is presented by [140]. [94]proposed a stochastic gradient HMC using second-order Langevin dynamics. Stochastic Gradient Langevin Dynamics(SGLD) have been proposed as a useful method for applying MCMC to Big Data where the accept-reject step is skippedand decreasing step size sequences are used [141]. For more detailed and rigorous mathematical framework, algorithmsand recommendations, interested readers are referred to [142].A popular method of scaling Bayesian inference, particularly in the case of analytically intractable distributions, isSequential Monte Carlo (SMC) or particle ﬁlters [143, 144, 97]. SMC algorithms have recently become popularas a method to approximate integrals. The reasons behind their popularity include their easy implementation andparallelisation ability, much needed characteristics in Big Data implementations [104]. SMC can approximate asequence of probability distributions on a sequence of spaces with an increasing dimension by applying resampling,propagation and weighting starting with the prior and eventually reaching to the posterior of interest of the cloudof particles. [97] proposed a sub-sampling SMC which is suitable for parallel computation in Big Data analysis,comprising two steps. First, the speed of the SMC is increased by using an unbiased and efﬁcient estimator ofthe likelihood, followed by a Metropolis within Gibbs kernel. The kernel is updated by a HMC method for modelparameters and a block-pseudo marginal proposal for the auxiliary variables [97]. Some novel approaches of SMCinclude: divide-and-conquer SMC [103], multilevel SMC [144], online SMC [145] and one pass SMC [146], amongothers.Stochastic variational inference (VI, also called Variational Bayes, VB) is a faster alternative to MCMC [147]. Itapproximates probability densities using a deterministic optimisation method [105] and has seen widespread use toapproximate posterior densities for Bayesian models in large-scale problems. The interested reader is referred to [148]for a detailed introduction to variational inference designed for statisticians, with applications. VI has been implementedin scaling up algorithms for Big Data. For example, a novel re-parameterisation of VI has been implemented for scalinglatent variable models and sparse GP regression to Big Data [149].11 P

REPRINT J UNE

9, 2020There have been studies which combined the VI and SMC in order to take advantage from both strategies in ﬁnding thetrue posterior [150, 151, 152]. [151] employed a SMC approach to get an improved variational approximation, [152] bysplitting the data into block, applied SMC to compute partial posterior for each block and used a variational argumentto get a proxy for the true posterior by the product of the partial posteriors. The combination of these two techniquesin a Big Data context was made by [150]. [150] proposed a new sampling scheme called Shortened Bridge Sampler,which combines the strength of deterministic approximations of the posterior that is variational Bayes with those ofSMC. This sampler resulted in reduced computational time for Big Data with huge numbers of parameters, such as datafrom genomics or network.[153] proposed a novel algorithm for Bayesian inference in the context of massive online streaming data, extendingthe Gibbs sampling mechanism for drawing samples from conditional distributions conditioned on sequential pointestimates of other parameters. The authors compared the performance of this conditional density ﬁltering algorithm inapproximating the true posterior with SMC and VB, and reported good performance and strong convergence of theproposed algorithm.Approximate Bayesian computation (ABC) is gaining popularity for statistical inference with high dimensional data andcomputationally intensive models where the likelihood is intractable [154]. A detailed overview of ABC can be foundin [155] and asymptotic properties of ABC are explored in [156]. ABC is a likelihood free method that approximatesthe posterior distribution utilising imperfect matching of summary statistics [155]. Improvements on existing ABCmethods for efﬁcient estimation of posterior density with Big Data (complex and high dimensional data with costlysimulations) have been proposed by [157]. The choice of summary statistics from high dimensional data is a topic ofactive discussion; see, for example, [157, 158]. [159] provided a reliable and robust method of model selection in ABCemploying random forests which was shown to have a gain in computational efﬁciency.There is another aspect of ABC recently in terms of approximating the likelihood using Bayesian Synthetic likelihood orempirical likelihood [160]. Bayesian synthetic likelihood arguably provides computationally efﬁcient approximationsof the likelihood with high dimensional summary statistics [161, 162]. Empirical likelihood, on the other hand is anon-parametric technique of approximating the likelihood empirically from the data considering the moment constraints;this has been suggested in the context of ABC [163], but has not been widely adopted. For further reading on empiricallikelihood, see [164].Classiﬁcation and regression trees are also very useful tools in data mining and Big Data analysis [165]. There areBayesian versions of regression trees such as Bayesian Additive Regression Trees (BART) [166, 167, 113]. The BARTalgorithm has also been applied to the Big Data context and sparse variable selection by [168, 169, 170].Some other recommendations to speed up computations are to use graphics processing units [171, 172] and parallelprogramming approaches [173, 174, 175, 176].

The extensive development of Bayesian computational solutions has opened the door to further developments inBayesian modelling. Many of these new methods are set in the context of application areas. For example, there havebeen applications of ABC for Big Data in many different ﬁelds [177, 101]. For example, [177] developed a highperformance computing ABC approach for estimation of parameters in platelets deposition, while [101] proposed ABCmethods for inference in high dimensional multivariate spatial data from a large number of locations with a particularfocus on model selection for application to spatial extremes analysis. Bayesian mixtures are a popular modelling tool.VB and ABC techniques have been used for ﬁtting Bayesian mixture models to Big Data [178, 179, 147, 148, 109].Variable selection in Big Data (wide in particular, having massive number of variables) is a demanding problem. [180]proposed multivariate extensions of the Bayesian group lasso for variable selection in high dimensional data usingBayesian hierarchical models utilising spike and slab priors with application to gene expression data. The variableselection problem can also be solved employing ABC type algorithms. [181] proposed a sampling technique, ABCBayesian forests, based on splitting the data, useful for high dimensional wide data, which turns out to be a robustmethod in identifying variables with larger marginal inclusion probability.Bayesian non-parametrics [182] have unbounded capacity to adjust unseen data through activating additional parametersthat were inactive before the emergence of new data. In other words, the new data are allowed to speak for themselvesin non-parametric models rather than imposing an arguably restricted model (that was learned on an available data) toaccommodate new data. The inherent ﬂexibility of these models to adjust with new data by adapting in complexitymakes them more suitable for Big Data as compared to their parametric counterparts. For a brief introduction toBayesian non-parametric models and a nontechnical overview of some of the main tools in the area, the interestedreader is referred to [183]. 12 P

REPRINT J UNE

9, 2020The popular tools in Bayesian non-parametrics include Gaussian processes (GP) [184], Dirichlet processes (DP)[185], Indian buffet process (IBP) [186] and inﬁnite hidden Markov models (iHMM) [187]. GP have been used for avariety of applications [188, 189, 190] and attempts have been made to scale it to Big Data [96, 191, 192, 193]. DPhave seen successes in clustering and faster computational algorithms are being adopted to scale them to Big Data[194, 195, 146, 110, 176]. IBP are used for latent feature modeling, where the number of features are determinedin a data-driven fashion and have been scaled to Big Data through variational inference algorithms [112]. Being analternative to classical HMM, one of the distinctive properties of iHMM is that it infers the number of hidden states inthe system from the available data and has been scaled to Big Data using particle ﬁltering algorithms [108].Gaussian Processes are also employed in the analysis of high dimensional spatially dependent data [196]. [196]provided model-based solutions employing low rank GP and nearest neighbour GP (NNGP) as scalable priors in ahierarchical framework to render full Bayesian inference for big spatial or spatio temporal data sets. [197] extended theapplicability of NNGP for inference of latent spatially dependent processes by developing a conjugate latent NNGPmodel as a practical alternative to onerous Bayesian computations. Use of variational optimisation with structuredBayesian GP latent variable model to analyse spatially dependent data is made in [198]. For a review of methods ofanalysis of massive spatially dependent data including the Bayesian approaches, see [199].Another Bayesian modelling approach that has been used for big and complex data is Bayesian Networks (BN). Thismethodology has generated a substantial literature examining theoretical, methodological and computational approaches,as well as applications [200]. BN belong to the family of probabilistic graphical models and based on direct acyclicgraphs which are very useful representation of causal relationship among variables [201]. BN are used as efﬁcientlearning tool in Big Data analysis integrated with scalable algorithms [202, 203]. For a more detailed understanding ofBN learning from Big Data, please see [200].Classiﬁcation is also an important tool for extracting information from Big Data and Bayesian classiﬁers, includingNaive Bayes classiﬁer (NBC) are used in Big Data classiﬁcation problems [204, 114]. Parallel implementation ofNBC has been proposed by [204]. Moreover, [114] evaluated the scalability of NBC in Big Data with application tosentiment classiﬁcation of millions of movie review and found NBC to have improved accuracy in Big Data. [205]proposed a scalable multi step clustering and classiﬁcation algorithm using Bayesian nonparametrics for Big Data withlarge n and small p which can also run in parallel.The past ﬁfteen years has also seen an increase in interest in Empirical Likelihood (EL) for Bayesian modelling. Theidea of replacing the likelihood with an empirical analogue in a Bayesian framework was ﬁrst explored in detail by[206]. The author demonstrated that this Bayesian Empirical Likelihood (BEL) approach increases the ﬂexibility ofEL approach by examining the length and coverage of BEL intervals. The paper tested the methods using simulateddata sets. Later, [207] provided probabilistic interpretations of BEL exploring moment condition models with EL andprovided a non parametric version of BEL, namely Bayesian Exponentially Tilted Empirical Likelihood (BETEL). TheBEL methods have been applied in spatial data analysis in [208] and [209, 210] for small area estimation.We acknowledge that there are many more studies on the application of Bayesian approaches in different ﬁelds ofinterest which are not included in this review. There are also other review papers on overlapping and closely relatedtopics. For example, [99] describes Bayesian methods of machine learning and includes some of the Bayesian inferencetechniques reviewed in the present study. However, the scope and focus of this review is different from that of [99],which was focused around the methods applicable to machine learning.

We are living in the era of Big Data and continuous research is in progress to make most use of the available information.The current chapter has attempted to review the recent developments made in Bayesian statistical approaches forhandling Big Data along with a general overview and classiﬁcation of the Big Data literature with brief review in last 5years. This review chapter provides relevant references in Big Data categorised in ﬁner classes, a brief description ofstatistical contributions to the ﬁeld and a more detailed discussion of the Bayesian approaches developed and applied inthe context of Big Data.On the basis of the reviews made above, it is clear that there has been a huge amount of work on issues related to cloudcomputing, analytics infrastructure and so on. However, the amount of research conducted from statistical perspectivesis also notable. In the last ﬁve years, there has been an exponential increase in published studies focused on developingnew statistical methods and algorithms, as well as scaling existing methods. These have been summarised in Section 4,with particular focus on Bayesian approaches in Section 5. In some instances citations are made outside of the speciﬁcperiod (see section 2) to refer the origin of the methods which are currently being applied or extended in Big Datascenarios. 13 P

REPRINT J UNE

9, 2020With the advent of computational infrastructure and advances in programming and software, Bayesian approaches areno longer considered as being very computationally expensive and onerous to execute for large volumes of data, that isBig Data. Traditional Bayesian methods are now becoming much more scalable due to the advent of parallelisation ofMCMC algorithms, divide and conquer and/or sub-sampling methods in MCMC, and advances in approximations suchas HMC, SMC, ABC, VB and so on. With the increasing volume of data, non-parametric Bayesian methods are alsogaining in popularity.This review chapter aimed to review a range of methodological and computational advancement made in BayesianStatistics for handling the difﬁculties arose by the advent of Big Data. By not focusing to any particular application, thischapter provided the readers with a general overview of the developments of Bayesian methodologies and computationalalgorithms for handling these issues. The review has revealed that most of the advancements in Bayesian Statistics forBig Data have been around computational time and scalability of particular algorithms, concentrating on estimating theposterior by adopting different techniques. However the developments of Bayesian methods and models for Big Data inthe recent literature cannot be overlooked. There are still many open problems for further research in the context of BigData and Bayesian approaches, as highlighted in this chapter.Based on the above discussion and the accompanying review presented in this chapter, it is apparent that to addressthe challenges of Big Data along with the strength of Bayesian statistics, research on both algorithms and models areessential.

References [1] Jifa G, Lingling Z. Data, DIKW, Big data and Data science. Procedia Comput Sci. 2014;31:814–821.[2] De Mauro A, Greco M, Grimaldi M. What is big data? A consensual deﬁnition and a review of key researchtopics. In: AIP Conference Proceedings. vol. 1644 (1). AIP; 2015. p. 97–104.[3] De Mauro A, Greco M, Grimaldi M. A formal deﬁnition of Big Data based on its essential features. Libr Rev.2016;65(3):122–135.[4] Wamba SF, Akter S, Edwards A, Chopin G, Gnanzou D. How ‘big data’ can make big impact: Findings from asystematic review and a longitudinal case study. Int J Prod Econ. 2015;165:234–246.[5] Bello-Orgaz G, Jung JJ, Camacho D. Social big data: Recent achievements and new challenges. Inf Fus.2016;28:45–59.[6] Luo J, Wu M, Gopukumar D, Zhao Y. Big data application in biomedical research and health care: a literaturereview. Biomed Inform Insights. 2016;8:BII–S31559.[7] Li S, Dragicevic S, Castro FA, Sester M, Winter S, Coltekin A, et al. Geospatial big data handling theory andmethods: A review and research challenges. ISPRS J Photogramm Remote Sens. 2016;115:119–133.[8] Kaisler S, Armour F, Espinosa JA, Money W. Big data: Issues and challenges moving forward. In: 2013 46thHawaii International Conference on System Sciences. IEEE; 2013. p. 995–1004.[9] Gandomi A, Haider M. Beyond the hype: Big data concepts, methods, and analytics. Int J Inf Manag.2015;35(2):137–144.[10] Sivarajah U, Kamal MM, Irani Z, Weerakkody V. Critical analysis of Big Data challenges and analytical methods.J Bus Res. 2017;70:263–286.[11] Xia XG. Small Data, Mid Data, and Big Data Versus Algebra, Analysis, and Topology. IEEE Signal ProcessMag. 2017;34(1):48–51.[12] Oprea D. Big Questions on Big Data. Revista de Cercetare si Interv Soc. 2016;55:112.[13] Emani CK, Cullot N, Nicolle C. Understandable big data: A survey. Comput Sci Rev. 2015;17:70–81.[14] Fan J, Han F, Liu H. Challenges of big data analysis. Natl Sci Rev. 2014;1(2):293–314.[15] Sagiroglu S, Sinanc D. Big data: A review. In: Collaboration Technologies and Systems (CTS), 2013 InternationalConference on. IEEE; 2013. p. 42–47.[16] Kousar H, Babu BP. Multi-Agent based MapReduce Model for Efﬁcient Utilization of System Resources.Indones JElectr Eng Comput sci. 2018;11(2):504–514.[17] Zhang Z, Choo KKR, Gupta BB. The convergence of new computing paradigms and big data analyticsmethodologies for online social networks. J Comput Sci. 2018;26:453–455.[18] Magdon-Ismail T, Narasimhadevara C, Jaffe D, Nambiar R. TPCx-HS v2: Transforming with TechnologyChanges. In: Technology Conference on Performance Evaluation and Benchmarking. Springer; 2017. p. 120–130.14 P

REPRINT J UNE

9, 2020[19] Siddiqa A, Karim A, Gani A. Big data storage technologies: a survey. Front of Inf Technol & Electronic Eng.2017;18(8):1040–1070.[20] Vyas A, Ram S. Comparative Study of MapReduce Frameworks in Big Data Analytics. Int J Mod Comput Sci.2017;5(Special Issue):5–13.[21] Apiletti D, Baralis E, Cerquitelli T, Garza P, Pulvirenti F, Venturini L. Frequent itemsets mining for Big Data: acomparative analysis. Big Data Res. 2017;9:67–83.[22] Zhang Y, Cao T, Li S, Tian X, Yuan L, Jia H, et al. Parallel processing systems for big data: a survey. Proceedingsof the IEEE. 2016;104(11):2114–2136.[23] Müller O, Junglas I, Brocke Jv, Debortoli S. Utilizing big data analytics for information systems research:challenges, promises and guidelines. Eur J Inf Syst. 2016;25(4):289–302.[24] Oancea B, Dragoescu RM, et al. Integrating R and Hadoop for Big Data Analysis. Romanian Stat Rev.2014;62(2):83–94.[25] Pandey S, Tokekar V. Prominence of MapReduce in big data processing. In: Communication Systems andNetwork Technologies (CSNT), 2014 Fourth International Conference on. IEEE; 2014. p. 555–560.[26] Watson HJ. Tutorial: Big data analytics: Concepts, technologies, and applications. CAIS. 2014;34:65.[27] Das T, Kumar PM. Big data analytics: A framework for unstructured data analysis. Int J Eng Sci Technol.2013;5(1):153.[28] Liu L. Computing infrastructure for big data processing. Front Comput Sci. 2013;7(2):165—170.[29] Manibharathi R, Dinesh R. Survey of Challenges in Encrypted Data Storage in Cloud Computing and Big Data.JNetw Commun Emerg Technol. 2018;8(2).[30] Moustafa N, Creech G, Sitnikova E, Keshk M. Collaborative anomaly detection framework for handling big dataof cloud computing. In: Military Communications and Information Systems Conference (MilCIS), 2017. IEEE;2017. p. 1–6.[31] Cai H, Xu B, Jiang L, Vasilakos AV. IoT-based big data storage systems in cloud computing: Perspectives andchallenges. IEEE Internet Things J. 2017;4(1):75–87.[32] Yang C, Huang Q, Li Z, Liu K, Hu F. Big Data and cloud computing: innovation opportunities and challenges.Int Journal of Digit Earth. 2017;10(1):13–53.[33] Assunção MD, Calheiros RN, Bianchi S, Netto MA, Buyya R. Big Data computing and clouds: Trends andfuture directions. Journal of Parallel and Distributed Comput. 2015;79:3–15.[34] Loebbecke C, Picot A. Reﬂections on societal and business model transformation arising from digitization andbig data analytics: A research agenda. J Strategic Inf Syst. 2015;24(3):149–157.[35] Branch R, Tjeerdsma H, Wilson C, Hurley R, McConnell S. Cloud computing and big data: a review of currentservice models and hardware perspectives. J Softw Eng Appl. 2014;7(08):686.[36] Zhang X, Liu C, Nepal S, Yang C, Dou W, Chen J. A hybrid approach for scalable sub-tree anonymization overbig data using MapReduce on cloud. J Comput Syst Sci. 2014;80(5):1008–1020.[37] Demirkan H, Delen D. Leveraging the capabilities of service-oriented decision support systems: Putting analyticsand big data in cloud. Decis Support Syst. 2013;55(1):412–421.[38] O’Driscoll A, Daugelaite J, Sleator RD. ‘Big data’, Hadoop and cloud computing in genomics. J Biomed Inform.2013;46(5):774–781.[39] Talia D. Clouds for scalable big data analytics. Comput. 2013;46(5):98–101.[40] Albury K, Burgess J, Light B, Race K, Wilken R. Data cultures of mobile dating and hook-up apps: Emergingissues for critical social science research. Big Data Soc. 2017;4(2):1–11.[41] Cappella JN. Vectors into the future of mass and interpersonal communication research: Big data, social media,and computational social science. Hum Commun Res. 2017;43(4):545–558.[42] Mansour RF. Understanding how big data leads to social networking vulnerability. Comput in Hum Behav.2016;57:348–351.[43] Shah DV, Cappella JN, Neuman WR. Big data, digital media, and computational social science: Possibilities andperils. Ann Am Acad Pol Soc Sci. 2015;659 (1):6–13.[44] Burrows R, Savage M. After the crisis? Big Data and the methodological challenges of empirical sociology. BigData Soc. 2014;1(1):1–6. 15 P

REPRINT J UNE

9, 2020[45] Mählmann L, Reumann M, Evangelatos N, Brand A. Big Data for Public Health Policy-Making: PolicyEmpowerment. Public Health genom. 2017;20(6):312–320.[46] Cheung AS. Moving Beyond Consent For Citizen Science in Big Data Health and Medical Research. NorthwestJ Technol Intellect Prop. 2018;16(1):15.[47] Alonso SG, de la Torre Díez I, Rodrigues JJ, Hamrioui S, López-Coronado M. A Systematic Review ofTechniques and Sources of Big Data in the Healthcare Sector. J Med Syst. 2017;41(11):183.[48] Bansal S, Chowell G, Simonsen L, Vespignani A, Viboud C. Big data for infectious disease surveillance andmodeling. J Infect Dis. 2016;214(suppl_4):S375–S379.[49] Rumsfeld JS, Joynt KE, Maddox TM. Big data analytics to improve cardiovascular care: promise and challenges.Nat Rev Cardiol. 2016;13(6):350–359.[50] Alyass A, Turcotte M, Meyre D. From big data analysis to personalized medicine for all: challenges andopportunities. BMC Med Genom. 2015;8(1):33.[51] Belle A, Thiagarajan R, Soroushmehr S, Navidi F, Beard DA, Najarian K. Big data analytics in healthcare.BioMed Res Int. 2015;2015.[52] Binder H, Blettner M. Big Data in Medical Science—a Biostatistical View: Part 21 of a Series on Evaluation ofScientiﬁc Publications. Dtsch Ärztebl Int. 2015;112(9):137.[53] Brennan PF, Bakken S. Nursing needs big data and big data needs nursing. J Nurs Scholarsh. 2015;47(5):477–484.[54] Viceconti M, Hunter P, Hose R. Big data, big knowledge: big data for personalized healthcare. IEEE J BiomedHealth Inform. 2015;19(4):1209–1215.[55] Bates DW, Saria S, Ohno-Machado L, Shah A, Escobar G. Big data in health care: using analytics to identifyand manage high-risk and high-cost patients. Health Aff. 2014;33(7):1123–1131.[56] Raghupathi W, Raghupathi V. Big data analytics in healthcare: promise and potential. Health Inf Sci and Syst.2014;2(1):3.[57] Roski J, Bo-Linn GW, Andrews TA. Creating value in health care through big data: opportunities and policyimplications. Health Affairs. 2014;33(7):1115–1122.[58] Yoo C, Ramirez L, Liuzzi J. Big data analysis using modern statistical and machine learning methods in medicine.Int Neurourol J. 2014;18(2):50.[59] Hay SI, George DB, Moyes CL, Brownstein JS. Big data opportunities for global infectious disease surveillance.PLoS Med. 2013;10(4):e1001413.[60] Raguseo E. Big data technologies: An empirical investigation on their adoption, beneﬁts and risks for companies.Int Journal of Inf Manag. 2018;38(1):187–195.[61] Ducange P, Pecori R, Mezzina P. A glimpse on big data analytics in the framework of marketing strategies. SoftComput. 2018;22(1):325–342.[62] Sun Z, Sun L, Strang K. Big data analytics services for enhancing business intelligence. J Comput Inf Syst.2018;58(2):162–169.[63] Fosso Wamba S, Mishra D. Big data integration with business processes: a literature review. Bus Process ManagJ. 2017;23(3):477–492.[64] Bradlow ET, Gangwar M, Kopalle P, Voleti S. The role of big data and predictive analytics in retail. Journal ofRetailing. 2017;93(1):79–95.[65] Akter S, Wamba SF. Big data Analytics in E-commerce: A systematic review and agenda for future research.Electronic Markets. 2016;26(2):173–194.[66] Bughin J. Big data, Big bang? Journal of Big Data. 2016;3(1):2.[67] Marshall A, Mueck S, Shockley R. How leading organizations use big data and analytics to innovate. StrategyLeadersh. 2015;43(5):32–39.[68] Divya KS, Bhargavi P, Jyothi S. Machine Learning Algorithms in Big data Analytics. Int J Comput Sci Eng.2018;6(1):63–70.[69] Bibault JE, Giraud P, Burgun A. Big data and machine learning in radiation oncology: state of the art and futureprospects. Cancer Lett. 2016;382(1):110–117.[70] Obermeyer Z, Emanuel EJ. Predicting the future—big data, machine learning, and clinical medicine. N EnglJMedicine. 2016;375(13):1216. 16 P

REPRINT J UNE

9, 2020[71] Akusok A, Björk KM, Miche Y, Lendasse A. High-performance extreme learning machines: a complete toolboxfor big data applications. IEEE Access. 2015;3:1011–1025.[72] Al-Jarrah OY, Yoo PD, Muhaidat S, Karagiannidis GK, Taha K. Efﬁcient machine learning for big data: Areview. Big Data Res. 2015;2(3):87–93.[73] Landset S, Khoshgoftaar TM, Richter AN, Hasanin T. A survey of open source tools for machine learning withbig data in the Hadoop ecosystem. J Big Data. 2015;2(1):24.[74] Suthaharan S. Big data classiﬁcation: Problems and challenges in network intrusion prediction with machinelearning. ACM SIGMETRICS Perform Eval Rev. 2014;41(4):70–73.[75] Bifet A, Morales GDF. Big data stream learning with Samoa. In: 2014 IEEE International Conference on DataMining Workshop (ICDMW). IEEE; 2014. p. 1199–1202.[76] Fahad A, Alshatri N, Tari Z, Alamri A, Khalil I, Zomaya AY, et al. A survey of clustering algorithms for bigdata: Taxonomy and empirical analysis. IEEE Trans Emerg Top Comput. 2014;2(3):267–279.[77] Huang HH, Liu H. Big data machine learning and graph analytics: Current state and future challenges. In: 2014IEEE International Conference on Big Data (Big Data). IEEE; 2014. p. 16–17.[78] Dunson DB. Statistics in the big data era: Failures of the machine. Stat Probab Lett. 2018;136:4–9.[79] Nongxa LG. Mathematical and statistical foundations and challenges of (big) data sciences. S Afr J Sci.2017;113(3-4):1–4.[80] Webb-Vargas Y, Chen S, Fisher A, Mejia A, Xu Y, Crainiceanu C, et al. Big Data and Neuroimaging. Stat Biosci.2017;9(2):543–558.[81] Wang C, Chen MH, Wu J, Yan J, Zhang Y, Schifano E. Online updating method with new variables for big datastreams. Can Journal Stat. 2017;46(1):123–146.[82] Wang T, Samworth RJ. High dimensional change point estimation via sparse projection. J Royal Stat Soc: Ser B(Stat Methodol). 2017;80(1):57–83.[83] Liu Z, Sun F, McGovern DP. Sparse generalized linear model with L0 approximation for feature selection andprediction with big omics data. BioData Min. 2017;10(1).[84] Zhang T, Yang B. An exact approach to ridge regression for big data. ComputStat. 2017;p. 1–20.[85] Franke B, Plante JF, Roscher R, Lee EsA, Smyth C, Hateﬁ A, et al. Statistical inference, learning and models inbig data. Int Stat Rev. 2016;84(3):371–389.[86] Schifano ED, Wu J, Wang C, Yan J, Chen MH. Online updating of statistical inference in the big data setting.Technometrics. 2016;58(3):393–403.[87] Hilbert M. Big data for development: A review of promises and challenges. Dev Policy Rev. 2016;34(1):135–174.[88] Wang, Chen MH, Schifano E, Wu J, Yan J. Statistical methods and computing for big data. Stat Interface.2016;9(4):399–414.[89] Doornik JA, Hendry DF. Statistical model selection with “Big Data”. Cogent Econ Fin. 2015;3(1):1045216.[90] Pehlivanlı AÇ. A novel feature selection scheme for high-dimensional data sets: four-Staged Feature Selection.J Appl Stat. 2015;43(6):1140–1154.[91] Wise AF, Shaffer DW. Why theory matters more than ever in the age of big data. J Learn Anal. 2015;2(2):5–13.[92] Chen, Chen EE, Zhao W, Zou W. Statistics in Big Data. J Chin Stat Assoc. 2015;53:186–202.[93] Sysoev O, Grimvall A, Burdakov O. Bootstrap Conﬁdence Intervals for Large-scale Multivariate MonotonicRegression Problems. Commun Stat - Simul Comput. 2014;45(3):1025–1040.[94] Chen T, Fox E, Guestrin C. Stochastic gradient Hamiltonian Monte Carlo. In: Int. Conference on MachineLearning; 2014. p. 1683–1691.[95] Hoerl RW, Snee RD, De Veaux RD. Applying statistical thinking to ‘Big Data’ problems. Wiley Interdisci Rev:Comput Stat. 2014;6(4):222–232.[96] Hensman J, Fusi N, Lawrence ND. Gaussian processes for big data. arXiv preprint arXiv:13096835. 2013;.[97] Gunawan D, Kohn R, Quiroz M, Dang KD, Tran MN. Subsampling Sequential Monte Carlo for Static BayesianModels. arXiv Preprint arXiv:180503317. 2018;.[98] Quiroz M, Kohn R, Villani M, Tran MN. Speeding up MCMC by efﬁcient data subsampling. J Am Stat Assoc.2018 mar;p. 1–13. 17 P

REPRINT J UNE

9, 2020[99] Zhu J, Chen J, Hu W, Zhang B. Big learning with Bayesian methods. Natl Sci Rev. 2017;4(4):627–651.[100] Wu C, Robert CP. Average of Recentered Parallel MCMC for Big Data. arXiv preprint arXiv:170604780. 2017;.[101] Lee XJ, Hainy M, McKeone JP, Drovandi CC, Pettitt AN. ABC model selection for spatial extremes modelsapplied to South Australian maximum temperature data. Comput Stat Data Anal. 2018;128:128–144.[102] Minsker S, Srivastava S, Lin L, Dunson DB. Robust and scalable Bayes via a median of subset posterior measures.J Mach Learn Res. 2017;18(1):4488–4527.[103] Lindsten F, Johansen AM, Naesseth CA, Kirkpatrick B, Schön TB, Aston J, et al. Divide-and-conquer withsequential Monte Carlo. J Comput Graph Stat. 2017;26(2):445–458.[104] Lee A, Whiteley N. Forest resampling for distributed sequential Monte Carlo. Stat Anal Data Min. 2016;9(4):230–248.[105] Liu Q, Wang D. Stein variational gradient descent: A general purpose Bayesian inference algorithm. In:Advances In Neural Information Processing Systems; 2016. p. 2378–2386.[106] Scott SL, Blocker AW, Bonassi FV, Chipman HA, George EI, McCulloch RE. Bayes and big data: The consensusMonte Carlo algorithm. Int J Manag Sci Eng Manag. 2016;11(2):78–88.[107] Hassani H, Silva ES. Forecasting with big data: A review. Ann of Data Sci. 2015;2(1):5–19.[108] Tripuraneni N, Gu S, Ge H, Ghahramani Z. Particle Gibbs for inﬁnite hidden Markov models. In: Advances inNeural Information Processing Systems; 2015. p. 2395–2403.[109] Moores MT, Drovandi CC, Mengersen K, Robert CP. Pre-processing for approximate Bayesian computation inimage analysis. Stat Comput. 2015;25(1):23–33.[110] Ma Z, Rana PK, Taghia J, Flierl M, Leijon A. Bayesian estimation of Dirichlet mixture model with variationalinference. Pattern recognit. 2014;47(9):3143–3157.[111] Strathmann H, Sejdinovic D, Girolami M. Unbiased Bayes for big data: Paths of partial posteriors. arXiv preprintarXiv:150103326. 2015;.[112] Zoubin G. Scaling the Indian Buffet process via submodular maximization. In: International Conference onMachine Learning; 2013. p. 1013–1021.[113] Allenby GM, Bradlow ET, George EI, Liechty J, McCulloch RE. Perspectives on Bayesian methods and bigdata. Cust Needs and Solut. 2014;1(3):169–175.[114] Liu B, Blasch E, Chen Y, Shen D, Chen G. Scalable sentiment classiﬁcation for big data analysis using NaiveBayes classiﬁer. In: 2013 IEEE International Conference on Big Data. IEEE; 2013. p. 99–104.[115] Qiu J, Wu Q, Ding G, Xu Y, Feng S. A survey of machine learning for big data processing. EURASIP J AdvSignal Process. 2016;2016(1):67.[116] Baldominos A, Albacete E, Saez Y, Isasi P. A scalable machine learning online service for big data real-timeanalysis. In: Computational Intelligence in Big Data (CIBD), 2014 IEEE Symposium on. IEEE; 2014. p. 1–8.[117] Azar AT, Hassanien AE. Dimensionality reduction of medical big data using neural-fuzzy classiﬁer. Soft Comput.2015;19(4):1115–1127.[118] Genuer R, Poggi JM, Tuleau-Malot C, Villa-Vialaneix N. Random Forests for Big Data. Big Data Res.2017;9:28–46.[119] Wang, Xu Y. Fast clustering using adaptive density peak detection. Stat Methods Med Res. 2015;26(6):2800–2811.[120] Allen GI, Grosenick L, Taylor J. A Generalized Least-Square Matrix Decomposition. J Am Stat Assoc.2014;109(505):145–159.[121] Yu L, Lin N. ADMM for Penalized Quantile Regression in Big Data. Int Stat Rev. 2017;85(3):494–518.[122] Doornik JA. Autometrics. In: in Honour of David F. Hendry. University Press; 2009. p. 88–121.[123] Politis DN, Romano JP, Wolf M. Subsampling. Springer Science & Business Media; 1999.[124] Srivastava S, Li C, Dunson DB. Scalable Bayes via barycenter in Wasserstein space. J Mach Learn Res.2018;19(1):312–346.[125] Castruccio S, Genton MG. Compressing an Ensemble With Statistical Models: An Algorithm for Global 3DSpatio-Temporal Temperature. Technometrics. 2016;58(3):319–328.[126] McCormick TH, Ferrell R, Karr AF, Ryan PB. Big data, big results: Knowledge discovery in output fromlarge-scale analytics. Stat Analysis Data Min. 2014;7(5):404–412.18 P

REPRINT J UNE

9, 2020[127] Neiswanger W, Wang C, Xing E. Asymptotically exact, embarrassingly parallel MCMC. arXiv preprintarXiv:13114780. 2013;.[128] White S, Kypraios T, Preston SP. Piecewise Approximate Bayesian Computation: fast inference for discretelyobserved Markov models using a factorised posterior distribution. Stat and Comput. 2015;25(2):289–301.[129] Wang X, Dunson DB. Parallelizing MCMC via Weierstrass sampler. arXiv preprint arXiv:13124605. 2013;.[130] Guhaniyogi R, Banerjee S. Meta-Kriging: Scalable Bayesian Modeling and Inference for Massive SpatialDatasets. Technometrics. 2018 jun;60(4):430–444.[131] Guhaniyogi R, Banerjee S. Multivariate spatial meta kriging. Stat Probab Lett. 2019;144:3–8.[132] Lasinio GJ, Mastrantonio G, Pollice A. Discussing the “big n problem”. Stat Methods Appt. 2013;22(1):97–112.[133] Bouchard-Côté A, Vollmer SJ, Doucet A. The bouncy particle sampler: A nonreversible rejection-free Markovchain Monte Carlo method. J Am Stat Assoc. 2018;p. 1–13.[134] Maclaurin D, Adams RP. Fireﬂy Monte Carlo: Exact MCMC with Subsets of Data. In: Artiﬁcial Intelligence,Twenty-Fourth International Joint Conference on; 2014. p. 543–552.[135] Korattikara A, Chen Y, Welling M. Austerity in MCMC land: Cutting the Metropolis-Hastings budget. In:International Conference on Machine Learning; 2014. p. 181–189.[136] Bardenet R, Doucet A, Holmes C. Towards scaling up Markov chain Monte Carlo: an adaptive subsamplingapproach. In: International Conference on Machine Learning (ICML); 2014. p. 405–413.[137] Bardenet R, Doucet A, Holmes C. On Markov chain Monte Carlo methods for tall data. J Mach Learn Res.2017;18(1):1515–1557.[138] Quiroz M, Villani M, Kohn R. Scalable MCMC for large data problems using data subsampling and the differenceestimator. SSRN Electronic Journal. 2015;.[139] Maire F, Friel N, Alquier P. Informed sub-sampling MCMC: approximate Bayesian inference for large datasets.Stat Comput. 2017;p. 1–34.[140] Betancourt M. A conceptual introduction to Hamiltonian Monte Carlo. arXiv preprint arXiv: 170102434. 2017;.[141] Ahn S, Shahbaba B, Welling M. Distributed Stochastic Gradient MCMC. In: International Conference onMachine Learning; 2014. p. 1044–1052.[142] Teh YW, Thiery AH, Vollmer SJ. Consistency and ﬂuctuations for stochastic gradient Langevin dynamics. JMach Learn Res. 2016;17(1):193–225.[143] Chopin N, Jacob PE, Papaspiliopoulos O. SMC2: an efﬁcient algorithm for sequential analysis of state spacemodels. JRoyal Stat Soc Ser B (Stat Methodol). 2013;75(3):397–426.[144] Beskos A, Jasra A, Muzaffer EA, Stuart AM. Sequential Monte Carlo methods for Bayesian elliptic inverseproblems. Stat Comput. 2015;25(4):727–737.[145] Gloaguen P, Etienne MP, Le Corff S. Online sequential Monte Carlo smoother for partially observed diffusionprocesses. URASIP J Adv Signal Process. 2018;2018(1):9.[146] Lin D. Online learning of nonparametric mixture models via sequential variational approximation. In: Advancesin Neural Information Processing Systems; 2013. p. 395–403.[147] Hoffman MD, Blei DM, Wang C, Paisley J. Stochastic variational inference. J Mach Learn Res. 2013;14(1):1303–1347.[148] Blei DM, Kucukelbir A, McAuliffe JD. Variational inference: A review for statisticians. J Am Stat Assoc.2017;112(518):859–877.[149] Gal Y, Van Der Wilk M, Rasmussen CE. Distributed variational inference in sparse Gaussian process regressionand latent variable models. In: Advances in neural information processing systems; 2014. p. 3257–3265.[150] Donnet S, Robin S. Shortened Bridge Sampler: Using deterministic approximations to accelerate SMC forposterior sampling. arXiv preprint arXiv 170707971. 2017;.[151] Naesseth CA, Linderman SW, Ranganath R, Blei DM. Variational Sequential Monte Carlo. arXiv preprintarXiv:170511140. 2017;.[152] Rabinovich M, Angelino E, Jordan MI. Variational consensus Monte Carlo. In: Advances in Neural InformationProcessing Systems; 2015. p. 1207–1215.[153] Guhaniyogi R, Qamar S, Dunson DB. Bayesian conditional density ﬁltering for big data. Stat. 2014;1050:15.19 P

REPRINT J UNE

9, 2020[154] McKinley TJ, Vernon I, Andrianakis I, McCreesh N, Oakley JE, Nsubuga RN, et al. Approximate Bayesiancomputation and simulation-based inference for complex stochastic epidemic models. Stat Sci. 2018;33(1):4–18.[155] Sisson S, Fan Y, Beaumont M. Overview of ABC. Handbook of Approximate Bayesian Computation. 2018;p.3–54.[156] Frazier DT, Martin GM, Robert CP, Rousseau J. Asymptotic properties of approximate Bayesian computation.Biometrika. 2018;00(0):1–15.[157] Izbicki R, Lee AB, Pospisil T. ABC–CDE: Toward Approximate Bayesian Computation With Complex High-Dimensional Data and Limited Simulations. J Comput Graph Stat. 2019;p. 1–20.[158] Singh P, Hellander A. Multi-Statistic Approximate Bayesian Computation with Multi-Armed Bandits. arXivpreprint arXiv:180508647. 2018;.[159] Pudlo P, Marin JM, Estoup A, Cornuet JM, Gautier M, Robert CP. Reliable ABC model choice via randomforests. Bioinformatics. 2015;32(6):859–866.[160] Drovandi CC, Grazian C, Mengersen K, Robert C. Approximating the Likelihood in ABC. In: Handbook ofApproximate Bayesian Computation. Chapman and Hall/CRC; 2018. p. 321–368.[161] Meeds E, Welling M. GPS-ABC: Gaussian process surrogate approximate Bayesian computation. arXiv preprintarXiv:14012838. 2014;.[162] Wilkinson R. Accelerating ABC methods using Gaussian processes. In: Artiﬁcial Intelligence and Statistics;2014. p. 1015–1023.[163] Mengersen KL, Pudlo P, Robert CP. Bayesian computation via empirical likelihood. Proc National AcadSciences. 2013;110(4):1321–1326.[164] Owen AB. Empirical Likelihood. Chapman and Hall/CRC; 2001.[165] Breiman L. Classiﬁcation and Regression Trees. Routledge; 2017.[166] Chipman HA, George EI, McCulloch RE, et al. BART: Bayesian additive regression trees. Ann Appl Stat.2010;4(1):266–298.[167] Kapelner A, Bleich J. bartMachine: Machine learning with Bayesian additive regression trees. arXiv preprintarXiv:13122171. 2013;.[168] Rocková V, van der Pas S. Posterior concentration for Bayesian regression trees and forests. Ann Stat(InRevision). 2017;p. 1–40.[169] van der Pas S, Rockova V. Bayesian dyadic trees and histograms for regression. In: Advances in NeuralInformation Processing Systems; 2017. p. 2089–2099.[170] Linero AR. Bayesian regression trees for high-dimensional prediction and variable selection. J Am Stat Assoc.2018;p. 1–11.[171] Lee A, Yau C, Giles MB, Doucet A, Holmes CC. On the utility of graphics cards to perform massively parallelsimulation of advanced Monte Carlo methods. J Comput Graph Stat. 2010;19(4):769–789.[172] Suchard MA, Wang Q, Chan C, Frelinger J, Cron A, West M. Understanding GPU programming for statisticalcomputation: Studies in massively parallel massive mixtures. J Comput Graphical Stat. 2010;19(2):419–438.[173] Guha S, Hafen R, Rounds J, Xia J, Li J, Xi B, et al. Large complex data: divide and recombine (d&r) with rhipe.Stat. 2012;1(1):53–67.[174] Chang J, Fisher III JW. Parallel sampling of DP mixture models using sub-cluster splits. In: Advances in NeuralInformation Processing Systems; 2013. p. 620–628.[175] Williamson S, Dubey A, Xing EP. Parallel Markov chain Monte Carlo for nonparametric mixture models. In:Proceedings of the 30th International Conference on Machine Learning (ICML-13); 2013. p. 98–106.[176] Ge H, Chen Y, Wan M, Ghahramani Z. Distributed inference for Dirichlet process mixture models. In:International Conference on Machine Learning; 2015. p. 2276–2284.[177] Dutta R, Schoengens M, Onnela JP, Mira A. ABCpy: A user-friendly, extensible, and parallel library forapproximate Bayesian computation. In: Proceedings of the Platform for Advanced Scientiﬁc ComputingConference; 2017. p. 1–9.[178] McGrory CA, Titterington D. Variational approximations in Bayesian model selection for ﬁnite mixturedistributions. Comput StatData Anal. 2007;51(11):5352–5367.[179] Tank A, Foti N, Fox E. Streaming variational inference for Bayesian nonparametric mixture models. In: ArtiﬁcialIntelligence and Statistics; 2015. p. 968–976. 20 P

REPRINT J UNE

9, 2020[180] Liquet B, Mengersen K, Pettitt A, Sutton M, et al. Bayesian variable selection regression of multivariateresponses for group data. Bayesian Anal. 2017;12(4):1039–1067.[181] Liu Y, Roˇcková V, Wang Y. ABC Variable Selection with Bayesian Forests. arXiv preprint arXiv:180602304.2018;.[182] Müller P, Quintana FA, Jara A, Hanson T. Bayesian nonparametric data analysis. Springer; 2015.[183] Ghahramani Z. Bayesian non-parametrics and the probabilistic approach to modelling. Phil Trans R Soc A.2013;371(1984):20110553.[184] Rasmussen CE. Gaussian processes in machine learning. In: Advanced lectures on machine learning. Springer;2004. p. 63–71.[185] Rasmussen CE. The inﬁnite Gaussian mixture model. In: Advances in Neural Information Processing Systems;2000. p. 554–560.[186] Ghahramani Z, Grifﬁths TL. Inﬁnite latent feature models and the Indian buffet process. In: Advances in neuralinformation processing systems; 2006. p. 475–482.[187] Beal MJ, Ghahramani Z, Rasmussen CE. The inﬁnite hidden Markov model. In: Advances in Neural InformationProcessing Systems; 2002. p. 577–584.[188] Chalupka K, Williams CK, Murray I. A framework for evaluating approximation methods for Gaussian processregression. J Mach Learn Res. 2013;14(Feb):333–350.[189] Damianou A, Lawrence N. Deep Gaussian processes. In: Artiﬁcial Intelligence and Statistics; 2013. p. 207–215.[190] Buettner F, Natarajan KN, Casale FP, Proserpio V, Scialdone A, Theis FJ, et al. Computational analysis ofcell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. NatBiotechnol. 2015;33(2):155.[191] Hensman J, Matthews AGdG, Ghahramani Z. Scalable variational Gaussian process classiﬁcation. In: ArtiﬁcialIntelligence and Statistics (AISTATS), 18th International Conference on; 2015. p. 351–360.[192] Tran D, Ranganath R, Blei DM. The variational Gaussian process. arXiv preprint arXiv:151106499. 2015;.[193] Deisenroth MP, Ng JW. Distributed Gaussian processes. In: Proceedings of the 32nd International Conferenceon International Conference on Machine Learning-Volume 37. JMLR. org; 2015. p. 1481–1490.[194] Wang C, Paisley J, Blei D. Online variational inference for the hierarchical Dirichlet process. In: Proceedings ofthe Fourteenth International Conference on Artiﬁcial Intelligence and Statistics; 2011. p. 752–760.[195] Wang L, Dunson DB. Fast Bayesian inference in Dirichlet process mixture models. J Comput Graphical Stat.2011;20(1):196–216.[196] Banerjee S. High-dimensional Bayesian geostatistics. Bayesian Anal. 2017;12(2):583.[197] Zhang L, Datta A, Banerjee S. Practical Bayesian modeling and inference for massive spatial data sets on modestcomputing environments. Stat Anal Data Min. 2019;12(3):197–209.[198] Atkinson S, Zabaras N. Structured Bayesian Gaussian process latent variable model: Applications to data-drivendimensionality reduction and high-dimensional inversion. J Comput Phys. 2019;383:166–195.[199] Heaton MJ, Datta A, Finley A, Furrer R, Guhaniyogi R, Gerber F, et al. Methods for analyzing large spatial data:A review and comparison. arXiv preprint arXiv:171005013. 2017;.[200] Tang Y, Xu Z, Zhuang Y. Bayesian network structure learning from big data: A reservoir sampling basedensemble method. In: International Conference on Database Systems for Advanced Applications. Springer;2016. p. 209–222.[201] Ben-Gal I. Bayesian Networks. Encycl Stat Qual Reliab. 2008;1:1–6.[202] Wang J, Tang Y, Nguyen M, Altintas I. A scalable data science workﬂow approach for big data Bayesian networklearning. In: 2014 IEEE/ACM Int Symp. Big Data Comput. IEEE; 2014. p. 16–25.[203] Zhou L, Pan S, Wang J, Vasilakos AV. Machine learning on big data: Opportunities and challenges. Neurocom-puting. 2017;237:350–361.[204] Katkar VD, Kulkarni SV. A novel parallel implementation of Naive Bayesian classiﬁer for Big Data. In: GreenComputing, Communication and Conservation of Energy (ICGCE), 2013 International Conference on. IEEE;2013. p. 847–852.[205] Ni Y, Müller P, Diesendruck M, Williamson S, Zhu Y, Ji Y. Scalable Bayesian Nonparametric Clustering andClassiﬁcation. Journal of Computational and Graphical Statistics. 2019;p. 1–13.21 P

REPRINT J UNE

9, 2020[206] Lazar NA. Bayesian Empirical Likelihood. Biom. 2003;90(2):319–326.[207] Schennach SM. Bayesian exponentially tilted empirical likelihood. Biometrika. 2005;92(1):31–46.[208] Chaudhuri S, Ghosh M. Empirical likelihood for small area estimation. Biometrika. 2011;p. 473–480.[209] Porter AT, Holan SH, Wikle CK. Bayesian semiparametric hierarchical empirical likelihood spatial models. JStat Plan Inference. 2015;165:78–90.[210] Porter AT, Holan SH, Wikle CK. Multivariate spatial hierarchical Bayesian empirical likelihood methods forsmall area estimation. Stat. 2015;4(1):108–116.

Acknowledgement

This research was supported by an ARC Australian Laureate Fellowship for project, Bayesian Learning for DecisionMaking in the Big Data Era under Grant no. FL150100150. The authors also acknowledge the support of the AustralianResearch Council (ARC) Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS).