Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where J. S. Marron is active.

Publication


Featured researches published by J. S. Marron.


Proceedings of the National Academy of Sciences of the United States of America | 2003

Repeated observation of breast tumor subtypes in independent gene expression data sets

Therese Sørlie; Robert Tibshirani; Joel S. Parker; Trevor Hastie; J. S. Marron; Andrew B. Nobel; Shibing Deng; Hilde Johnsen; Robert Pesich; Stephanie Geisler; Janos Demeter; Charles M. Perou; Per Eystein Lønning; Patrick O. Brown; Anne Lise Børresen-Dale; David Botstein

Characteristic patterns of gene expression measured by DNA microarrays have been used to classify tumors into clinically relevant subgroups. In this study, we have refined the previously defined subtypes of breast tumors that could be distinguished by their distinct patterns of gene expression. A total of 115 malignant breast tumors were analyzed by hierarchical clustering based on patterns of expression of 534 “intrinsic” genes and shown to subdivide into one basal-like, one ERBB2-overexpressing, two luminal-like, and one normal breast tissue-like subgroup. The genes used for classification were selected based on their similar expression levels between pairs of consecutive samples taken from the same tumor separated by 15 weeks of neoadjuvant treatment. Similar cluster analyses of two published, independent data sets representing different patient cohorts from different laboratories, uncovered some of the same breast cancer subtypes. In the one data set that included information on time to development of distant metastasis, subtypes were associated with significant differences in this clinical feature. By including a group of tumors from BRCA1 carriers in the analysis, we found that this genotype predisposes to the basal tumor subtype. Our results strongly support the idea that many of these breast tumor subtypes represent biologically distinct disease entities.


Journal of the American Statistical Association | 1996

A Brief Survey of Bandwidth Selection for Density Estimation

M. C. Jones; J. S. Marron; Simon J. Sheather

Abstract There has been major progress in recent years in data-based bandwidth selection for kernel density estimation. Some “second generation” methods, including plug-in and smoothed bootstrap techniques, have been developed that are far superior to well-known “first generation” methods, such as rules of thumb, least squares cross-validation, and biased cross-validation. We recommend a “solve-the-equation” plug-in bandwidth selector as being most reliable in terms of overall performance. This article is intended to provide easy accessibility to the main ideas for nonexperts.


IEEE Transactions on Software Engineering | 2000

Predicting fault incidence using software change history

Todd L. Graves; Alan F. Karr; J. S. Marron; Harvey P. Siy

This paper is an attempt to understand the processes by which software ages. We define code to be aged or decayed if its structure makes it unnecessarily difficult to understand or change and we measure the extent of decay by counting the number of faults in code in a period of time. Using change management data from a very large, long-lived software system, we explore the extent to which measurements from the change history are successful in predicting the distribution over modules of these incidences of faults. In general, process measures based on the change history are more useful in predicting fault rates than product metrics of the code: For instance, the number of times code has been changed is a better indication of how many faults it will contain than is its length. We also compare the fault rates of code of various ages, finding that if a module is, on the average, a year older than an otherwise similar module, the older module will have roughly a third fewer faults. Our most successful model measures the fault potential of a module as the sum of contributions from all of the times the module has been changed, with large, recent changes receiving the most weight.


IEEE Transactions on Software Engineering | 2001

Does code decay? Assessing the evidence from change management data

Stephen G. Eick; Todd L. Graves; Alan F. Karr; J. S. Marron; Audris Mockus

A central feature of the evolution of large software systems is that change-which is necessary to add new functionality, accommodate new hardware, and repair faults-becomes increasingly difficult over time. We approach this phenomenon, which we term code decay, scientifically and statistically. We define code decay and propose a number of measurements (code decay indices) on software and on the organizations that produce it, that serve as symptoms, risk factors, and predictors of decay. Using an unusually rich data set (the fifteen-plus year change history of the millions of lines of software for a telephone switching system), we find mixed, but on the whole persuasive, statistical evidence of code decay, which is corroborated by developers of the code. Suggestive indications that perfective maintenance can retard code decay are also discussed.


Journal of the American Statistical Association | 1999

SiZer for exploration of structures in curves

Probal Chaudhuri; J. S. Marron

Abstract In the use of smoothing methods in data analysis, an important question is which observed features are “really there,” as opposed to being spurious sampling artifacts. An approach is described based on scale-space ideas originally developed in the computer vision literature. Assessment of Significant ZERo crossings of derivatives results in the SiZer map, a graphical device for display of significance of features with respect to both location and scale. Here “scale” means “level of resolution”; that is, “bandwidth.”


Journal of the American Statistical Association | 1990

Comparison of Data-Driven Bandwidth Selectors

Byeong U. Park; J. S. Marron

Abstract This article compares several promising data-driven methods for selecting the bandwidth of a kernel density estimator. The methods compared are least squares cross-validation, biased cross-validation, and a plug-in rule. The comparison is done by asymptotic rate of convergence to the optimum and a simulation study. It is seen that the plug-in bandwidth is usually most efficient when the underlying density is sufficiently smooth, but is less robust when there is not enough smoothness present. We believe the plug-in rule is the best of those currently available, but there is still room for improvement.


Journal of the American Statistical Association | 1988

How Far are Automatically Chosen Regression Smoothing Parameters from their Optimum

Wolfgang Karl Härdle; Peter Hall; J. S. Marron

Abstract We address the problem of smoothing parameter selection for nonparametric curve estimators in the specific context of kernel regression estimation. Call the “optimal bandwidth” the minimizer of the average squared error. We consider several automatically selected bandwidths that approximate the optimum. How far are the automatically selected bandwidths from the optimum? The answer is studied theoretically and through simulations. The theoretical results include a central limit theorem that quantifies the convergence rate and gives the differences asymptotic distribution. The convergence rate turns out to be excruciatingly slow. This is not too disappointing, because this rate is of the same order as the convergence rate of the difference between the minimizers of the average squared error and the mean average squared error. In some simulations by John Rice, the selectors considered here performed quite differently from each other. We anticipated that these differences would be reflected in differ...


Statistics & Probability Letters | 1987

Estimation of integrated squared density derivatives

Peter Hall; J. S. Marron

Kernel density estimators are used for the estimation of integrals of various squared derivatives of a probability density. Rates of convergence in mean squared error are calculated, which show that appropriate values of the smoothing parameter are much smaller than those for ordinary density estimation. The rate of convergence increases with stronger smoothness assumptions, however, unlike ordinary density estimation, the parametric rate of n-1 can be achieved even when only a finite amount of differentiability is assumed. The implications for data-driven bandwidth selection in ordinary density estimation are considered.


Journal of the American Statistical Association | 1990

Kernel quantile estimators

Simon J. Sheather; J. S. Marron

Abstract For an estimator of quantiles, the efficiency of the sample quantile can be improved by considering linear combinations of order statistics, that is, L estimators. A variety of such methods have appeared in the literature; an important aspect of this article is that asymptotically several of these are shown to be kernel estimators with a Guassian kernel, and the bandwidths are identified. It is seen that some implicit choices of the smoothing parameter are asymptotically suboptimal. In addition, the theory of this article suggests a method for choosing the smoothing parameter. How much reliance should be placed on the theoretical results is investigated through a simulation study. Over a variety of distributions little consistent difference is found between various estimators. An important conclusion, made during the theoretical analysis, is that all of these estimators usually provide only modest improvement over the sample quantile. The results indicate that even if one knew the best estimator ...


Journal of the American Statistical Association | 1991

Transformations in Density Estimation

M. P. Wand; J. S. Marron; David Ruppert

Abstract For the density estimation problem the global window width kernel density estimator does not perform well when the underlying density has features that require different amounts of smoothing at different locations. In this article we propose to transform the data with the intention that a global window width is more appropriate for the density of the transformed data. The density estimate of the original data is the “back-transform” by change of variables of the global window width estimate of the transformed datas density. We explore choosing the transformation from suitable parametric families. Data-based selection rules for the choice of transformations and the window width are discussed. Application to real and simulated data demonstrates the usefulness of our proposals.

Collaboration


Dive into the J. S. Marron's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Peter Hall

University of Melbourne

View shared research outputs
Top Co-Authors

Avatar

Stephen M. Pizer

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

Haipeng Shen

University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

Wolfgang Karl Härdle

Humboldt University of Berlin

View shared research outputs
Top Co-Authors

Avatar

Félix Hernández-Campos

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

Yufeng Liu

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

Andrew B. Nobel

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Marc Niethammer

University of North Carolina at Chapel Hill

View shared research outputs
Researchain Logo
Decentralizing Knowledge