Hongxiao Zhu
Virginia Tech
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hongxiao Zhu.
Journal of the American Statistical Association | 2011
Hongxiao Zhu; Philip J. Brown; Jeffrey S. Morris
Functional data are increasingly encountered in scientific studies, and their high dimensionality and complexity lead to many analytical challenges. Various methods for functional data analysis have been developed, including functional response regression methods that involve regression of a functional response on univariate/multivariate predictors with nonparametrically represented functional coefficients. In existing methods, however, the functional regression can be sensitive to outlying curves and outlying regions of curves, so is not robust. In this article, we introduce a new Bayesian method, robust functional mixed models (R-FMM), for performing robust functional regression within the general functional mixed model framework, which includes multiple continuous or categorical predictors and random effect functions accommodating potential between-function correlation induced by the experimental design. The underlying model involves a hierarchical scale mixture model for the fixed effects, random effect, and residual error functions. These modeling assumptions across curves result in robust nonparametric estimators of the fixed and random effect functions which down-weight outlying curves and regions of curves, and produce statistics that can be used to flag global and local outliers. These assumptions also lead to distributions across wavelet coefficients that have outstanding sparsity and adaptive shrinkage properties, with great flexibility for the data to determine the sparsity and the heaviness of the tails. Together with the down-weighting of outliers, these within-curve properties lead to fixed and random effect function estimates that appear in our simulations to be remarkably adaptive in their ability to remove spurious features yet retain true features of the functions. We have developed general code to implement this fully Bayesian method that is automatic, requiring the user to only provide the functional data and design matrices. It is efficient enough to handle large datasets, and yields posterior samples of all model parameters that can be used to perform desired Bayesian estimation and inference. Although we present details for a specific implementation of the R-FMM using specific distributional choices in the hierarchical model, 1D functions, and wavelet transforms, the method can be applied more generally using other heavy-tailed distributions, higher dimensional functions (e.g., images), and using other invertible transformations as alternatives to wavelets. Supplementary materials for this article are available online.
Computational Statistics & Data Analysis | 2012
Fengrong Wei; Hongxiao Zhu
We consider the problem of selecting grouped variables in linear regression and generalized linear regression models, based on penalized likelihood. A number of penalty functions have been used for this purpose, including the smoothly clipped absolute deviation (SCAD) penalty and the minimax concave penalty (MCP). These penalty functions, in comparison to the popularly used Lasso, have attractive theoretical properties such as unbiasedness and selection consistency. Although the model fitting methods using these penalties are well developed for individual variable selection, the extension to grouped variable selection is not straightforward, and the fitting can be unstable due to the nonconvexity of the penalty functions. To this end, we propose the group coordinate descent (GCD) algorithms, which extend the regular coordinate descent algorithms. These GCD algorithms are efficient, in that the computation burden only increases linearly with the number of the covariate groups. We also show that using the GCD algorithm, the estimated parameters converge to a global minimum when the sample size is larger than the dimension of the covariates, and converge to a local minimum otherwise. In addition, we demonstrate the regions of the parameter space in which the objective function is locally convex, even though the penalty is nonconvex. In addition to group selection in the linear model, the GCD algorithms can also be extended to generalized linear regression. We present details of the extension using an example of logistic regression. The efficiency of the proposed algorithms are presented through simulation studies and a real data example, in which the MCP based and SCAD based GCD algorithms provide improved group selection results as compared to the group Lasso.
Biometrics | 2012
Hongxiao Zhu; Philip J. Brown; Jeffrey S. Morris
This article introduces new methods for performing classification of complex, high-dimensional functional data using the functional mixed model (FMM) framework. The FMM relates a functional response to a set of predictors through functional fixed and random effects, which allows it to account for various factors and between-function correlations. The methods include training and prediction steps. In the training steps we train the FMM model by treating class designation as one of the fixed effects, and in the prediction steps we classify the new objects using posterior predictive probabilities of class. Through a Bayesian scheme, we are able to adjust for factors affecting both the functions and the class designations. While the methods can be used in any FMM framework, we provide details for two specific Bayesian approaches: the Gaussian, wavelet-based FMM (G-WFMM) and the robust, wavelet-based FMM (R-WFMM). Both methods perform modeling in the wavelet space, which yields parsimonious representations for the functions, and can naturally adapt to local features and complex nonstationarities in the functions. The R-WFMM allows potentially heavier tails for features of the functions indexed by particular wavelet coefficients, leading to a down-weighting of outliers that makes the method robust to outlying functions or regions of functions. The models are applied to a pancreatic cancer mass spectroscopy data set and compared with other recently developed functional classification methods.
Biometrics | 2010
Hongxiao Zhu; Marina Vannucci; Dennis D. Cox
In functional data classification, functional observations are often contaminated by various systematic effects, such as random batch effects caused by device artifacts, or fixed effects caused by sample-related factors. These effects may lead to classification bias and thus should not be neglected. Another issue of concern is the selection of functions when predictors consist of multiple functions, some of which may be redundant. The above issues arise in a real data application where we use fluorescence spectroscopy to detect cervical precancer. In this article, we propose a Bayesian hierarchical model that takes into account random batch effects and selects effective functions among multiple functional predictors. Fixed effects or predictors in nonfunctional form are also included in the model. The dimension of the functional data is reduced through orthonormal basis expansion or functional principal components. For posterior sampling, we use a hybrid Metropolis-Hastings/Gibbs sampler, which suffers slow mixing. An evolutionary Monte Carlo algorithm is applied to improve the mixing. Simulation and real data application show that the proposed model provides accurate selection of functional predictors as well as good classification.
Journal of the American Statistical Association | 2016
Lin Zhang; Veerabhadran Baladandayuthapani; Hongxiao Zhu; Keith A. Baggerly; Tadeusz Majewski; Bogdan Czerniak; Jeffrey S. Morris
ABSTRACT We develop a functional conditional autoregressive (CAR) model for spatially correlated data for which functions are collected on areal units of a lattice. Our model performs functional response regression while accounting for spatial correlations with potentially nonseparable and nonstationary covariance structure, in both the space and functional domains. We show theoretically that our construction leads to a CAR model at each functional location, with spatial covariance parameters varying and borrowing strength across the functional domain. Using basis transformation strategies, the nonseparable spatial-functional model is computationally scalable to enormous functional datasets, generalizable to different basis functions, and can be used on functions defined on higher dimensional domains such as images. Through simulation studies, we demonstrate that accounting for the spatial correlation in our modeling leads to improved functional regression performance. Applied to a high-throughput spatially correlated copy number dataset, the model identifies genetic markers not identified by comparable methods that ignore spatial correlations. Supplementary materials for this article are available online.
Bayesian Analysis | 2016
Jingjing Yang; Hongxiao Zhu; Taeryon Choi; Dennis D. Cox
Functional data, with basic observational units being functions (e.g., curves, surfaces) varying over a continuum, are frequently encountered in various applications. While many statistical tools have been developed for functional data analysis, the issue of smoothing all functional observations simultaneously is less studied. Existing methods often focus on smoothing each individual function separately, at the risk of removing important systematic patterns common across functions. We propose a nonparametric Bayesian approach to smooth all functional observations simultaneously and nonparametrically. In the proposed approach, we assume that the functional observations are independent Gaussian processes subject to a common level of measurement errors, enabling the borrowing of strength across all observations. Unlike most Gaussian process regression models that rely on pre-specified structures for the covariance kernel, we adopt a hierarchical framework by assuming a Gaussian process prior for the mean function and an Inverse-Wishart process prior for the covariance function. These prior assumptions induce an automatic mean-covariance estimation in the posterior inference in addition to the simultaneous smoothing of all observations. Such a hierarchical framework is flexible enough to incorporate functional data with different characteristics, including data measured on either common or uncommon grids, and data with either stationary or nonstationary covariance structures. Simulations and real data analysis demonstrate that, in comparison with alternative methods, the proposed Bayesian approach achieves better smoothing accuracy and comparable mean-covariance estimation results. Furthermore, it can successfully retain the systematic patterns in the functional observations that are usually neglected by the existing functional data analyses based on individual-curve smoothing.
Physical Review Letters | 2017
Rolf Müller; Anupam Gupta; Hongxiao Zhu; Mittu Pannala; Uzair S. Gillani; Yanqing Fu; Philip Caspers; John R. Buck
Horseshoe bats have dynamic biosonar systems with interfaces for ultrasonic emission (reception) that change shape while diffracting the outgoing (incoming) sound waves. An information-theoretic analysis based on numerical and physical prototypes shows that these shape changes add sensory information (mutual information between distant shape conformations <20%), increase the number of resolvable directions of sound incidence, and improve the accuracy of direction finding. These results demonstrate that horseshoe bats have a highly effective substrate for dynamic encoding of sensory information.
PLOS ONE | 2017
Chen Ming; Anupam Gupta; Ruijin Lu; Hongxiao Zhu; Rolf Müller
Since many bat species thrive in densely vegetated habitats, echoes from foliage are likely to be of prime importance to the animals’ sensory ecology, be it as clutter that masks prey echoes or as sources of information about the environment. To better understand the characteristics of foliage echoes, a new model for the process that generates these signals has been developed. This model takes leaf size and orientation into account by representing the leaves as circular disks of varying diameter. The two added leaf parameters are of potential importance to the sensory ecology of bats, e.g., with respect to landmark recognition and flight guidance along vegetation contours. The full model is specified by a total of three parameters: leaf density, average leaf size, and average leaf orientation. It assumes that all leaf parameters are independently and identically distributed. Leaf positions were drawn from a uniform probability density function, sizes and orientations each from a Gaussian probability function. The model was found to reproduce the first-order amplitude statistics of measured example echoes and showed time-variant echo properties that depended on foliage parameters. Parameter estimation experiments using lasso regression have demonstrated that a single foliage parameter can be estimated with high accuracy if the other two parameters are known a priori. If only one parameter is known a priori, the other two can still be estimated, but with a reduced accuracy. Lasso regression did not support simultaneous estimation of all three parameters. Nevertheless, these results demonstrate that foliage echoes contain accessible information on foliage type and orientation that could play a role in supporting sensory tasks such as landmark identification and contour following in echolocating bats.
Technometrics | 2018
Hongxiao Zhu; Philip Caspers; Jeffrey S. Morris; Xiaowei Wu; Rolf Müller
ABSTRACT Sonar emits pulses of sound and uses the reflected echoes to gain information about target objects. It offers a low cost, complementary sensing modality for small robotic platforms. Although existing analytical approaches often assume independence across echoes, real sonar data can have more complicated structures due to device setup or experimental design. In this article, we consider sonar echo data collected from multiple terrain substrates with a dual-channel sonar head. Our goals are to identify the differential sonar responses to terrains and study the effectiveness of this dual-channel design in discriminating targets. We describe a unified analytical framework that achieves these goals rigorously, simultaneously, and automatically. The analysis was done by treating the echo envelope signals as functional responses and the terrain/channel information as covariates in a functional regression setting. We adopt functional mixed models that facilitate the estimation of terrain and channel effects while capturing the complex hierarchical structure in data. This unified analytical framework incorporates both Gaussian models and robust models. We fit the models using a full Bayesian approach, which enables us to perform multiple inferential tasks under the same modeling framework, including selecting models, estimating the effects of interest, identifying significant local regions, discriminating terrain types, and describing the discriminatory power of local regions. Our analysis of the sonar-terrain data identifies time regions that reflect differential sonar responses to terrains. The discriminant analysis suggests that a multi- or dual-channel design achieves target identification performance comparable with or better than a single-channel design. Supplementary materials for this article are available online.
NeuroImage | 2018
Hongxiao Zhu; Francesco Versace; Paul M. Cinciripini; Philip Rausch; Jeffrey S. Morris
ABSTRACT Event‐related potentials (ERPs) summarize electrophysiological brain response to specific stimuli. They can be considered as correlated functions of time with both spatial correlation across electrodes and nested correlations within subjects. Commonly used analytical methods for ERPs often focus on pre‐determined extracted components and/or ignore the correlation among electrodes or subjects, which can miss important insights, and tend to be sensitive to outlying subjects, time points or electrodes. Motivated by ERP data in a smoking cessation study, we introduce a Bayesian spatial functional regression framework that models the entire ERPs as spatially correlated functional responses and the stimulus types as covariates. This novel framework relies on mixed models to characterize the effects of stimuli while simultaneously accounting for the multilevel correlation structure. The spatial correlation among the ERP profiles is captured through basis‐space Matérn assumptions that allow either separable or nonseparable spatial correlations over time. We induce both adaptive regularization over time and spatial smoothness across electrodes via a correlated normal‐exponential‐gamma (CNEG) prior on the fixed effect coefficient functions. Our proposed framework includes both Gaussian models as well as robust models using heavier‐tailed distributions to make the regression automatically robust to outliers. We introduce predictive methods to select among Gaussian vs. robust models and models with separable vs. non‐separable spatiotemporal correlation structures. Our proposed analysis produces global tests for stimuli effects across entire time (or time‐frequency) and electrode domains, plus multiplicity‐adjusted pointwise inference based on experiment‐wise error rate or false discovery rate to flag spatiotemporal (or spatio‐temporal‐frequency) regions that characterize stimuli differences, and can also produce inference for any prespecified waveform components. Our analysis of the smoking cessation ERP data set reveals numerous effects across different types of visual stimuli. HIGHLIGHTSEstimates spatiotemporal effects of various stimuli on Event‐related potentials.Models separable or nonseparable inter‐electrode spatial correlation over time.Accounts for multilevel data structures through Bayesian functional mixed models.Achieves adaptive regularization over time and spatial smoothness over electrodes.Enables robust modeling, model selection, global test and pointwise inference.