Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ben Sherwood is active.

Publication


Featured researches published by Ben Sherwood.


Statistics in Medicine | 2013

Weighted quantile regression for analyzing health care cost data with missing covariates

Ben Sherwood; Lan Wang; Xiao Hua Zhou

Analysis of health care cost data is often complicated by a high level of skewness, heteroscedastic variances and the presence of missing data. Most of the existing literature on cost data analysis have been focused on modeling the conditional mean. In this paper, we study a weighted quantile regression approach for estimating the conditional quantiles health care cost data with missing covariates. The weighted quantile regression estimator is consistent, unlike the naive estimator, and asymptotically normal. Furthermore, we propose a modified BIC for variable selection in quantile regression when the covariates are missing at random. The quantile regression framework allows us to obtain a more complete picture of the effects of the covariates on the health care cost and is naturally adapted to the skewness and heterogeneity of the cost data. The method is semiparametric in the sense that it does not require to specify the likelihood function for the random error or the covariates. We investigate the weighted quantile regression procedure and the modified BIC via extensive simulations. We illustrate the application by analyzing a real data set from a health care cost study.


Annals of Statistics | 2016

Partially linear additive quantile regression in ultra-high dimension

Ben Sherwood; Lan Wang

We consider a flexible semiparametric quantile regression model for analyzing high dimensional heterogeneous data. This model has several appealing features: (1) By considering different conditional quantiles, we may obtain a more complete picture of the conditional distribution of a response variable given high dimensional covariates. (2) The sparsity level is allowed to be different at different quantile levels. (3) The partially linear additive structure accommodates nonlinearity and circumvents the curse of dimensionality. (4) It is naturally robust to heavy-tailed distributions. In this paper, we approximate the nonlinear components using B-spline basis functions. We first study estimation under this model when the nonzero components are known in advance and the number of covariates in the linear part diverges. We then investigate a nonconvex penalized estimator for simultaneous variable selection and estimation. We derive its oracle property for a general class of nonconvex penalty functions in the presence of ultra-high dimensional covariates under relaxed conditions. To tackle the challenges of nonsmooth loss function, nonconvex penalty function and the presence of nonlinear components, we combine a recently developed convex-differencing method with modern empirical process techniques. Monte Carlo simulations and an application to a microarray study demonstrate the effectiveness of the proposed method. We also discuss how the method for a single quantile of interest can be extended to simultaneous variable selection and estimation at multiple quantiles.


Epigenetics | 2018

Genome-wide DNA methylation associations with spontaneous preterm birth in US blacks: findings in maternal and cord blood samples

Xiumei Hong; Ben Sherwood; Christine Ladd-Acosta; Shouneng Peng; Hongkai Ji; Ke Hao; Irina Burd; Tami R. Bartell; Guoying Wang; Hui Ju Tsai; Xin Liu; Yuelong Ji; Anastacia Wahl; Deanna Caruso; Aviva Lee-Parritz; Barry Zuckerman; Xiaobin Wang

ABSTRACT Preterm birth (PTB) affects one in six Black babies in the United States. Epigenetics is believed to play a role in PTB; however, only a limited number of epigenetic studies of PTB have been reported, most of which have focused on cord blood DNA methylation (DNAm) and/or were conducted in white populations. Here we conducted, by far, the largest epigenome-wide DNAm analysis in 300 Black women who delivered early spontaneous preterm (sPTB, n = 150) or full-term babies (n = 150) and replicated the findings in an independent set of Black mother-newborn pairs from the Boston Birth Cohort. DNAm in maternal blood and/or cord blood was measured using the Illumina HumanMethylation450 BeadChip. We identified 45 DNAm loci in maternal blood associated with early sPTB, with a false discovery rate (FDR) <5%. Replication analyses confirmed sPTB associations for cg03915055 and cg06804705, located in the promoter regions of the CYTIP and LINC00114 genes, respectively. Both loci had comparable associations with early sPTB and early medically-indicated PTB, but attenuated associations with late sPTB. These associations could not be explained by cell composition, gestational complications, and/or nearby maternal genetic variants. Analyses in the newborns of the 110 Black women showed that cord blood methylation levels at both loci had no associations with PTB. The findings from this study underscore the role of maternal DNAm in PTB risk, and provide a set of maternal loci that may serve as biomarkers for PTB. Longitudinal studies are needed to clarify temporal relationships between maternal DNAm and PTB risk.


Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring | 2016

Using quantile regression to create baseline norms for neuropsychological tests

Ben Sherwood; Andrew Zhou; Sandra Weintraub; Lan Wang

The Uniform Data Set (UDS) contains neuropsychological test scores and demographic information for participants at Alzheimers disease centers across the United States funded by the National Institute on Aging. Mean regression analysis of neuropsychological tests has been proposed to detect cognitive decline, but the approach requires stringent assumptions.


Nature Communications | 2017

Genome-wide prediction of DNase I hypersensitivity using gene expression

Weiqiang Zhou; Ben Sherwood; Zhicheng Ji; Yingchao Xue; Fang Du; Jiawei Bai; Mingyao Ying; Hongkai Ji

We evaluate the feasibility of using a biological sample’s transcriptome to predict its genome-wide regulatory element activities measured by DNase I hypersensitivity (DH). We develop BIRD, Big Data Regression for predicting DH, to handle this high-dimensional problem. Applying BIRD to the Encyclopedia of DNA Elements (ENCODE) data, we found that to a large extent gene expression predicts DH, and information useful for prediction is contained in the whole transcriptome rather than limited to a regulatory element’s neighboring genes. We show applications of BIRD-predicted DH in predicting transcription factor-binding sites (TFBSs), turning publicly available gene expression samples in Gene Expression Omnibus (GEO) into a regulome database, predicting differential regulatory element activities, and facilitating regulome data analyses by serving as pseudo-replicates. Besides improving our understanding of the regulome–transcriptome relationship, this study suggests that transcriptome-based prediction can provide a useful new approach for regulome mapping.A map of the activities of all genomic regulatory elements across cell types and conditions would be a tremendous resource. The computational method introduced here predicts genome-wide accessible sites from gene expression data and allows the authors to build a database of regulatory element activities using publicly available transcriptome data.


Journal of the American Statistical Association | 2018

Quantile-Optimal Treatment Regimes

Lan Wang; Yu Zhou; Rui Song; Ben Sherwood

ABSTRACT Finding the optimal treatment regime (or a series of sequential treatment regimes) based on individual characteristics has important applications in areas such as precision medicine, government policies, and active labor market interventions. In the current literature, the optimal treatment regime is usually defined as the one that maximizes the average benefit in the potential population. This article studies a general framework for estimating the quantile-optimal treatment regime, which is of importance in many real-world applications. Given a collection of treatment regimes, we consider robust estimation of the quantile-optimal treatment regime, which does not require the analyst to specify an outcome regression model. We propose an alternative formulation of the estimator as a solution of an optimization problem with an estimated nuisance parameter. This novel representation allows us to investigate the asymptotic theory of the estimated optimal treatment regime using empirical process techniques. We derive theory involving a nonstandard convergence rate and a nonnormal limiting distribution. The same nonstandard convergence rate would also occur if the mean optimality criterion is applied, but this has not been studied. Thus, our results fill an important theoretical gap for a general class of policy search methods in the literature. The article investigates both static and dynamic treatment regimes. In addition, doubly robust estimation and alternative optimality criterion such as that based on Gini’s mean difference or weighted quantiles are investigated. Numerical simulations demonstrate the performance of the proposed estimator. A data example from a trial in HIV+ patients is used to illustrate the application. Supplementary materials for this article are available online.


Journal of Multivariate Analysis | 2016

Variable selection for additive partial linear quantile regression with missing covariates

Ben Sherwood

The standard quantile regression model assumes a linear relationship at the quantile of interest and that all variables are observed. These assumptions are relaxed by considering a partial linear model with missing covariates. A weighted objective function using inverse probability weighting is proposed to remove the potential bias caused by missing data. Estimators using parametric and nonparametric estimates of the probability an observation has fully observed covariates are examined. A penalized and weighted objective function using the nonconvex penalties MCP or SCAD is used for variable selection of the linear terms in the presence of missing data. Assuming the missing data problems remains a low dimensional problem the penalized estimator has the oracle property including cases where p≫n. Theoretical challenges include handling missing data and partial linear models while working with a nonsmooth loss function and a nonconvex penalty function. The performance of the method is evaluated using Monte Carlo simulations and the methods are applied to model amount of time sober for patients leaving a rehabilitation center.


Human Heredity | 2016

Computational Prediction of the Global Functional Genomic Landscape: Applications, Methods, and Challenges

Weiqiang Zhou; Ben Sherwood; Hongkai Ji

Technological advances have led to an explosive growth of high-throughput functional genomic data. Exploiting the correlation among different data types, it is possible to predict one functional genomic data type from other data types. Prediction tools are valuable in understanding the relationship among different functional genomic signals. They also provide a cost-efficient solution to inferring the unknown functional genomic profiles when experimental data are unavailable due to resource or technological constraints. The predicted data may be used for generating hypotheses, prioritizing targets, interpreting disease variants, facilitating data integration, quality control, and many other purposes. This article reviews various applications of prediction methods in functional genomics, discusses analytical challenges, and highlights some common and effective strategies used to develop prediction methods for functional genomic data.


Journal of the American Statistical Association | 2014

Discussion of "Estimation and Accuracy after Model Selection" by Brad Efron.

Lan Wang; Ben Sherwood; Runze Li

Hjort, N. L., and Claeskens, G. (2003), “Frequentist Model Average Estimators,” Journal of the American Statistical Association, 98, 879–899. [991,1004] Hurvich, C. M., and Tsai, C.-L. (1990), “Model Selection for Least Absolute Deviations Regression in Small Samples,” Statistics & Probability Letters, 9, 259–265. [992] Knight, K., and Fu, W. (2000), “Asymptotics for Lasso-Type Estimators,” The Annals of Statistics, 28, 1356–1378. [1001] Mallows, C. L. (1973), “Some Comments on Cp ,” Technometrics, 15, 661–675. [991] Perlmutter, S., Aldering, G., Goldhaber, G., Knop, R., Nugent, P., Castro, P., Deustua, S., Fabbro, S., Goobar, A., Groom, D., Hook, I., Kim, A., Kim, M., Lee, J., Nunes, N., Pain, R., Pennypacker, C., Quimby, R., Lidman, C., Ellis, R., Irwin, M., McMahon, R., Ruiz-Lapuente, P., Walton, N., Schaefer, B., Boyle, B., Filippenko, A., Matheson, T., Fruchter, A., Panagia, N., Newberg, H., and Couch, W. (1999), “Measurements of Omega and Lambda From 42 High-redshift Supernovae,” The Astrophysical Journal, 517, 565– 586. [999] Riess, A., Filippenko, A., Challis, P., Clocchiatti, A., Diercks, A., Garnavich, P., Gilliland, R., Hogan, C., Jha, S., Kirshner, R., Leibundgut, B., Phillips, M., Reiss, D., Schmidt, B., Schommer, R., Smith, R., Spyromilio, J., Stubbs, C., Suntzeff, N., and Tonry, J. (1998), “Observational Evidence From Supernovae for an Accelerating Universe and a Cosmological Constant,” The Astrophysical Journal, 116, 1009–1038. [999] Sexton, J., and Laake, P. (2009), “Standard Errors for Bagged and Random Forest Estimators,” Computational Statistics and Data Analysis, 53, 801–811. [992] Tibshirani, R. (1996), “Regression Shrinkage and Selection Via the Lasso,” Journal of the Royal Statistical Society, Series B, 58, 267–288. [999]We congratulate Efron for his stimulating and timely work which addresses an important issue on estimation after model selection. In practice, it is typical to ignore the variability of the variable selection step, which could result in inaccurate post-selection inference. Although the flaw of such practice is widely recognized, finding a general solution is extremely challenging. The model selection step is often a complex decision process and can involve collecting expert opinions, preprocessing, applying a variable selection rule, data-driven choice of one or more tuning parameters, among others. Except in simple cases, explicitly characterizing the form of the post-selection estimator is itself difficult. The key result of this paper is a closed-form formula for obtaining the standard deviation of a “bootstrap smoothed” (or “bagged”) estimator. This elegant formula is not only simple to implement but also versatile. It indeed provides a general approach for obtaining a confidence interval for a class of parameters of interest while incorporating the variability of variable selection. Our discussions will focus on two aspects: (1) the generality of the method, and (2) further insight into the performance of the proposed method in a simple but hopefully informative example.


The Journal of Allergy and Clinical Immunology | 2016

Epigenome-wide association study links site-specific DNA methylation changes with cow's milk allergy

Xiumei Hong; Christine Ladd-Acosta; Ke Hao; Ben Sherwood; Hongkai Ji; Corinne A. Keet; Rajesh Kumar; Deanna Caruso; Xin Liu; Guoying Wang; Zhu Chen; Yuelong Ji; Guanyun Mao; Sheila O. Walker; Tami R. Bartell; Zhicheng Ji; Yifei Sun; Hui Ju Tsai; Jacqueline A. Pongracic; Daniel E. Weeks; Xiaobin Wang

Collaboration


Dive into the Ben Sherwood's collaboration.

Top Co-Authors

Avatar

Hongkai Ji

Johns Hopkins University

View shared research outputs
Top Co-Authors

Avatar

Lan Wang

University of Minnesota

View shared research outputs
Top Co-Authors

Avatar

Ke Hao

Icahn School of Medicine at Mount Sinai

View shared research outputs
Top Co-Authors

Avatar

Tami R. Bartell

Children's Memorial Hospital

View shared research outputs
Top Co-Authors

Avatar

Deanna Caruso

Johns Hopkins University

View shared research outputs
Top Co-Authors

Avatar

Guoying Wang

Johns Hopkins University

View shared research outputs
Top Co-Authors

Avatar

Weiqiang Zhou

Johns Hopkins University

View shared research outputs
Top Co-Authors

Avatar

Xiaobin Wang

Johns Hopkins University

View shared research outputs
Top Co-Authors

Avatar

Xiumei Hong

Johns Hopkins University

View shared research outputs
Top Co-Authors

Avatar

Yuelong Ji

Johns Hopkins University

View shared research outputs
Researchain Logo
Decentralizing Knowledge