Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Richard J. Samworth is active.

Publication


Featured researches published by Richard J. Samworth.


Journal of The Royal Statistical Society Series B-statistical Methodology | 2013

Variable selection with error control: another look at stability selection

Rajen Dinesh Shah; Richard J. Samworth

Summary. Stability selection was recently introduced by Meinshausen and Buhlmann as a very general technique designed to improve the performance of a variable selection algorithm. It is based on aggregating the results of applying a selection procedure to subsamples of the data. We introduce a variant, called complementary pairs stability selection, and derive bounds both on the expected number of variables included by complementary pairs stability selection that have low selection probability under the original procedure, and on the expected number of high selection probability variables that are excluded. These results require no (e.g. exchangeability) assumptions on the underlying model or on the quality of the original selection procedure. Under reasonable shape restrictions, the bounds can be further tightened, yielding improved error control, and therefore increasing the applicability of the methodology.


Geochemistry Geophysics Geosystems | 2006

Neogene overflow of Northern Component Water at the Greenland‐Scotland Ridge

H. R. Poore; Richard J. Samworth; Nicky White; S. M. Jones; I. N. McCave

In the North Atlantic Ocean, flow of North Atlantic Deep Water (NADW), and of its ancient counterpart Northern Component Water (NCW), across the Greenland-Scotland Ridge (GSR) is thought to have played an important role in ocean circulation. Over the last 60 Ma, the Iceland Plume has dynamically supported an area which encompasses the GSR. Consequently, bathymetry of the GSR has varied with time due to a combination of lithospheric plate cooling and fluctuations in the temperature and buoyancy within the underlying convecting mantle. Here, we reassess the importance of plate cooling and convective control on this northern gateway for NCW flow during the Neogene period, following Wright and Miller (1996). To tackle the problem, benthic foraminiferal isotope data sets have been assembled to examine δ13C gradients between the three major deep water masses (i.e., Northern Component Water, Southern Ocean Water, and Pacific Ocean Water). Composite records are reported on an astronomical timescale, and a nonparametric curve-fitting technique is used to produce regional estimates of δ13C for each water mass. Confidence bands were calculated, and error propagation techniques used to estimate %NCW and its uncertainty. Despite obvious reservations about using long-term variations of δ13C from disparate analyses and settings, and despite considerable uncertainties in our understanding of ancient oceanic transport pathways, the variation of NCW through time is consistent with independent estimates of the temporal variation of dynamical support associated with the Iceland Plume. Prior to 12 Ma, δ13C patterns overlap and %NCW cannot be isolated. Significant long-period variations are evident, which are consistent with previously published work. From 12 Ma, when lithospheric cooling probably caused the GSR to submerge completely, long-period δ13C patterns diverge significantly and allow reasonable %NCW estimates to be made. Our most robust result is a dramatic increase in NCW overflow between 6 and 2 Ma when dynamical support generated by the Iceland Plume was weakest. Between 6 and 12 Ma a series of variations in NCW overflow have been resolved.


Annals of Statistics | 2012

Optimal weighted nearest neighbour classifiers

Richard J. Samworth

We derive an asymptotic expansion for the excess risk (regret) of a weighted nearest-neighbour classifier. This allows us to find the asymptotically optimal vector of nonnegative weights, which has a rather simple form. We show that the ratio of the regret of this classifier to that of an unweighted k-nearest neighbour classifier depends asymptotically only on the dimension d of the feature vectors, and not on the underlying populations. The improvement is greatest when d=4, but thereafter decreases as


Electronic Journal of Statistics | 2010

Theoretical properties of the log-concave maximum likelihood estimator of a multidimensional density

Madeleine Cule; Richard J. Samworth

d\rightarrow\infty


Annals of Statistics | 2011

Approximation by Log-Concave Distributions, with Applications to Regression

Lutz Dümbgen; Richard J. Samworth; Dominic Schuhmacher

. The popular bagged nearest neighbour classifier can also be regarded as a weighted nearest neighbour classifier, and we show that its corresponding weights are somewhat suboptimal when d is small (in particular, worse than those of the unweighted k-nearest neighbour classifier when d=1), but are close to optimal when d is large. Finally, we argue that improvements in the rate of convergence are possible under stronger smoothness assumptions, provided we allow negative weights. Our findings are supported by an empirical performance comparison on both simulated and real data sets.


Annals of Statistics | 2016

Statistical and computational trade-offs in estimation of sparse principal components

Tengyao Wang; Quentin Berthet; Richard J. Samworth

We present theoretical properties of the log-concave maximum likelihood estimator of a density based on an independent and identically distributed sample in R d . Our study covers both the case where the true underlying density is log-concave, and where this model is misspecified. We begin by showing that for a sequence of log-concave densities, con- vergence in distribution implies much stronger types of convergence - in particular, it implies convergence in Hellinger distance and even in cer- tain exponentially weighted total variation norms. In our main result, we prove the existence and uniqueness of a log-concave density that minimises the Kullback-Leibler divergence from the true density over the class of all log-concave densities, and also show that the log-concave maximum like- lihood estimator converges almost surely in these exponentially weighted total variation norms to this minimiser. In the case of a correctly specified model, this demonstrates a strong type of consistency for the estimator; in a misspecified model, it shows that the estimator converges to the log- concave density that is closest in the Kullback-Leibler sense to the true density.


Statistics and Computing archive | 2010

Importance tempering

Robert B. Gramacy; Richard J. Samworth; Ruth King

We study the approximation of arbitrary distributions P on d-dimensional space by distributions with log-concave density. Approximation means minimizing a Kullback―Leibler-type functional. We show that such an approximation exists if and only if P has finite first moments and is not supported by some hyperplane. Furthermore we show that this approximation depends continuously on P with respect to Mallows distance D 1 (·, ·). This result implies consistency of the maximum likelihood estimator of a log-concave density under fairly general conditions. It also allows us to prove existence and consistency of estimators in regression models with a response Y = μ(X) + e, where X and e are independent, μ(·) belongs to a certain class of regression functions while e is a random error with log-concave density and mean zero.


Journal of Clinical Oncology | 2015

Diffuse Large B-Cell Lymphoma Classification System That Associates Normal B-Cell Subset Phenotypes With Prognosis

Karen Dybkær; Martin Bøgsted; Steffen Falgreen; Julie Støve Bødker; Malene Krag Kjeldsen; Alexander Schmitz; Anders Ellern Bilgrau; Zijun Y. Xu-Monette; Ling Li; Kim Steve Bergkvist; Maria Bach Laursen; Maria Rodrigo-Domingo; Sara Correia Marques; Sophie B. Rasmussen; Mette Nyegaard; Michael Gaihede; Michael Boe Møller; Richard J. Samworth; Rajen Dinesh Shah; Preben Johansen; Tarec Christoffer El-Galaly; Ken H. Young; Hans Erik Johnsen

In recent years, sparse principal component analysis has emerged as an extremely popular dimension reduction technique for high-dimensional data. The theoretical challenge, in the simplest case, is to estimate the leading eigenvector of a population covariance matrix under the assumption that this eigenvector is sparse. An impressive range of estimators have been proposed; some of these are fast to compute, while others are known to achieve the minimax optimal rate over certain Gaussian or sub-Gaussian classes. In this paper, we show that, under a widely-believed assumption from computational complexity theory, there is a fundamental trade-off between statistical and computational performance in this problem. More precisely, working with new, larger classes satisfying a restricted covariance concentration condition, we show that there is an effective sample size regime in which no randomised polynomial time algorithm can achieve the minimax optimal rate. We also study the theoretical performance of a (polynomial time) variant of the well-known semidefinite relaxation estimator, revealing a subtle interplay between statistical and computational efficiency.


Annals of Statistics | 2012

Independent component analysis via nonparametric maximum likelihood estimation

Richard J. Samworth; Ming Yuan

Simulated tempering (ST) is an established Markov chain Monte Carlo (MCMC) method for sampling from a multimodal density π(θ). Typically, ST involves introducing an auxiliary variable k taking values in a finite subset of [0,1] and indexing a set of tempered distributions, say πk(θ)∝π(θ)k. In this case, small values of k encourage better mixing, but samples from π are only obtained when the joint chain for (θ,k) reaches k=1. However, the entire chain can be used to estimate expectations under π of functions of interest, provided that importance sampling (IS) weights are calculated. Unfortunately this method, which we call importance tempering (IT), can disappoint. This is partly because the most immediately obvious implementation is naïve and can lead to high variance estimators. We derive a new optimal method for combining multiple IS estimators and prove that the resulting estimator has a highly desirable property related to the notion of effective sample size. We briefly report on the success of the optimal combination in two modelling scenarios requiring reversible-jump MCMC, where the naïve approach fails.


Annals of Statistics | 2016

Global rates of convergence in log-concave density estimation

Arlene Kh Kim; Richard J. Samworth

PURPOSE Current diagnostic tests for diffuse large B-cell lymphoma use the updated WHO criteria based on biologic, morphologic, and clinical heterogeneity. We propose a refined classification system based on subset-specific B-cell-associated gene signatures (BAGS) in the normal B-cell hierarchy, hypothesizing that it can provide new biologic insight and diagnostic and prognostic value. PATIENTS AND METHODS We combined fluorescence-activated cell sorting, gene expression profiling, and statistical modeling to generate BAGS for naive, centrocyte, centroblast, memory, and plasmablast B cells from normal human tonsils. The impact of BAGS-assigned subtyping was analyzed using five clinical cohorts (treated with cyclophosphamide, doxorubicin, vincristine, and prednisone [CHOP], n = 270; treated with rituximab plus CHOP [R-CHOP], n = 869) gathered across geographic regions, time eras, and sampling methods. The analysis estimated subtype frequencies and drug-specific resistance and included a prognostic meta-analysis of patients treated with first-line R-CHOP therapy. RESULTS Similar BAGS subtype frequencies were assigned across 1,139 samples from five different cohorts. Among R-CHOP-treated patients, BAGS assignment was significantly associated with overall survival and progression-free survival within the germinal center B-cell-like subclass; the centrocyte subtype had a superior prognosis compared with the centroblast subtype. In agreement with the observed therapeutic outcome, centrocyte subtypes were estimated as being less resistant than the centroblast subtype to doxorubicin and vincristine. The centroblast subtype had a complex genotype, whereas the centrocyte subtype had high TP53 mutation and insertion/deletion frequencies and expressed LMO2, CD58, and stromal-1-signature and major histocompatibility complex class II-signature genes, which are known to have a positive impact on prognosis. CONCLUSION Further development of a diagnostic platform using BAGS-assigned subtypes may allow pathogenetic studies to improve disease management.

Collaboration


Dive into the Richard J. Samworth's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tengyao Wang

University of Cambridge

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Charlotte Pawlyn

Institute of Cancer Research

View shared research outputs
Top Co-Authors

Avatar

Martin Kaiser

Institute of Cancer Research

View shared research outputs
Researchain Logo
Decentralizing Knowledge