Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Rebecca Nugent is active.

Publication


Featured researches published by Rebecca Nugent.


Journal of Computational and Graphical Statistics | 2010

A Generalized Single Linkage Method for Estimating the Cluster Tree of a Density

Werner Stuetzle; Rebecca Nugent

The goal of clustering is to detect the presence of distinct groups in a dataset and assign group labels to the observations. Nonparametric clustering is based on the premise that the observations may be regarded as a sample from some underlying density in feature space and that groups correspond to modes of this density. The goal then is to find the modes and assign each observation to the domain of attraction of a mode. The modal structure of a density is summarized by its cluster tree; modes of the density correspond to leaves of the cluster tree. Estimating the cluster tree is the primary goal of nonparametric cluster analysis. We adopt a plug-in approach to cluster tree estimation: estimate the cluster tree of the feature density by the cluster tree of a density estimate. For some density estimates the cluster tree can be computed exactly; for others we have to be content with an approximation. We present a graph-based method that can approximate the cluster tree of any density estimate. Density estimates tend to have spurious modes caused by sampling variability, leading to spurious branches in the graph cluster tree. We propose excess mass as a measure for the size of a branch, reflecting the height of the corresponding peak of the density above the surrounding valley floor as well as its spatial extent. Excess mass can be used as a guide for pruning the graph cluster tree. We point out mathematical and algorithmic connections to single linkage clustering and illustrate our approach on several examples. Supplemental materials for the article, including an R package implementing generalized single linkage clustering, all datasets used in the examples, and R code producing the figures and numerical results, are available online.


International Journal of Cancer | 2002

Social class and risk of Hodgkin's disease in young‐adult women in 1988–94

Sally L. Glaser; Christina A. Clarke; Rebecca Nugent; Cynthia B. Stearns; Ronald F. Dorfman

Hodgkins disease (HD) risk in young adults has been associated with higher childhood social class. Although recent decades have witnessed increases in both young‐adult HD incidence rates and the socioeconomic affluence reported to influence risk, social class risk factors have not been reexamined. For 204 cases and 254 controls aged 19–44 years from a population‐based case‐control study of HD diagnosed in 1988–94 in San Francisco area females, we evaluated social class predictors of HD overall and for subgroups defined by age and by ethnicity. HD was associated weakly with a few childhood social class markers but more strongly with combinations of these variables. Risk was higher for women with family‐owned than rented childhood homes; for US‐born women with single vs. shared bedrooms at age 11; and for women with 2+ births who were from smaller than larger childhood households. These patterns differed by age, with risk appearing to increase over the young‐adult years for some factors and to decrease for others. In whites, risk was additionally associated with having a single childhood bedroom in larger households, and with tall adult height in women from smaller childhood households. In nonwhites, risk was higher for single bedrooms at age 11 in smaller childhood households, taller height and higher maternal education. Most study findings support the hypothesis that HD development in young adults follows protection from early exposure to other children. Variation in risk by age suggests differing etiologies across young adulthood, or the importance of birth cohort‐appropriate social‐class measures. Negative findings for previously reported risk factors may reflect their insufficient heterogeneity of exposure or their failure to measure cohort‐relevant exposures in this population.


Methods of Molecular Biology | 2010

An overview of clustering applied to molecular biology.

Rebecca Nugent; Marina Meila

In molecular biology, we are often interested in determining the group structure in, e.g., a population of cells or microarray gene expression data. Clustering methods identify groups of similar observations, but the results can depend on the chosen methods assumptions and starting parameter values. In this chapter, we give a broad overview of both attribute- and similarity-based clustering, describing both the methods and their performance. The parametric and nonparametric approaches presented vary in whether or not they require knowing the number of clusters in advance as well as the shapes of the estimated clusters. Additionally, we include a biclustering algorithm that incorporates variable selection into the clustering procedure. We finish with a discussion of some common methods for comparing two clustering solutions (possibly from different methods). The user is advised to devote time and attention to determining the appropriate clustering approach (and any corresponding parameter values) for the specific application prior to analysis.


Cancer Causes & Control | 2004

Attenuation of social class and reproductive risk factor associations for Hodgkin lymphoma due to selection bias in controls

Sally L. Glaser; Christina A. Clarke; Theresa H.M. Keegan; Scarlett Lin Gomez; Rebecca Nugent; Barbara Topol; Cynthia B. Stearns; Susan L. Stewart

AbstractObjective: Hodgkin lymphoma (HL) risk has been linked with higher social class and lower parity, but our prior population-based case-control study in adult women had unexpected null findings for these variables. Because subject participation was 87% for cases but 65% for random digit-dialing (RDD) controls, we examined representativeness of our controls and the impact of detected bias on prior results. Methods: Using data from RDD enumeration, abbreviated interviews with nonparticipating controls, and the US census, we compared participating and nonparticipating RDD controls across several age groups and then recomputed odds ratios for risk factor associations adjusted for bias. Results: The 325 RDD control participants were younger, more likely to be white, better educated, and of lower birth order and lower parity than the nonparticipants. Adjustment of odds ratios for bias strengthened previously null findings for education and for parity, breast-feeding and miscarriages in young adult women; these latter changes eliminated previously apparent age modification of risks. Conclusions: Selection bias in female RDD controls resulted from differential participation by socioeconomic factors, varied with age, and produced underestimations of several associations in young women, including reproductive factors. Thus, our prior conclusions of etiologic irrelevance for some study variables may have been inaccurate.


Advanced Data Analysis and Classification | 2013

Clustering student skill set profiles in a unit hypercube using mixtures of multivariate betas

Nema Dean; Rebecca Nugent

This paper presents a finite mixture of multivariate betas as a new model-based clustering method tailored to applications where the feature space is constrained to the unit hypercube. The mixture component densities are taken to be conditionally independent, univariate unimodal beta densities (from the subclass of reparameterized beta densities given by Bagnato and Punzo in Comput Stat 28(4):10.1007/s00180-012-367-4, 2013). The EM algorithm used to fit this mixture is discussed in detail, and results from both this beta mixture model and the more standard Gaussian model-based clustering are presented for simulated skill mastery data from a common cognitive diagnosis model and for real data from the Assistment System online mathematics tutor (Feng et al. in J User Model User Adap Inter 19(3):243–266, 2009). The multivariate beta mixture appears to outperform the standard Gaussian model-based clustering approach, as would be expected on the constrained space. Fewer components are selected (by BIC-ICL) in the beta mixture than in the Gaussian mixture, and the resulting clusters seem more reasonable and interpretable.


Archive | 2010

Clustering with Confidence: A Low-Dimensional Binning Approach

Rebecca Nugent; Werner Stuetzle

We present a plug-in method for estimating the cluster tree of a density. The method takes advantage of the ability to exactly compute the level sets of a piecewise constant density estimate. We then introduce clustering with confidence, an automatic pruning procedure that assesses significance of splits (and so clusters) in the cluster tree; the only user input required is the desired confidence level.


The American Journal of Medicine | 2009

A Commentary on the Use of the Internal Medicine In-Training Examination

Helen Wang; Rebecca Nugent; Connie Nugent; Kenneth Nugent; Michael Phy

T G y ERSPECTIVES VIEWPOINTS he Internal Medicine In-Training Examination (IMTE) was developed by the American College of Physiians (ACP), the Association of Program Directors in nternal Medicine (APDIM), and the Association of Proessors of Medicine (APM). The current test has 340 uestions in 11 content areas and a 7-hour time limit, and s administered annually in the fall. The test is deigned to evaluate the knowledge base of postgraduate ear (PGY)-2 residents; however, most internal medicine esidency programs administer the test to residents at all evels. In 2007, more than 18,000 internal medicine resients took the ITE. This test allows residents to compare heir medical knowledge with other residents at the same evel of training and to enable individual programs to rack residents’ progress and evaluate and consider curiculum changes. The IM-ITE is a “low-stakes” test, and nformation on the ACP website suggests residents should ot study for this test. We reviewed the literature availble on the IM-ITE to determine which factors (if any) orrelate with test scores, its utility in predicting a pass on he American Board of Medicine Certifying Examination ABIMCE), and its usefulness in program development nd change.


Journal of Experimental Child Psychology | 2016

Developmental changes in semantic knowledge organization

Layla Unger; Anna V. Fisher; Rebecca Nugent; Samuel L. Ventura; Christopher J. MacLellan

Semantic knowledge is a crucial aspect of higher cognition. Theoretical accounts of semantic knowledge posit that relations between concepts provide organizational structure that converts information known about individual entities into an interconnected network in which concepts can be linked by many types of relations (e.g., taxonomic, thematic). The goal of the current research was to address several methodological shortcomings of prior studies on the development of semantic organization, by using a variant of the spatial arrangement method (SpAM) to collect graded judgments of relatedness for a set of entities that can be cross-classified into either taxonomic or thematic groups. In Experiment 1, we used the cross-classify SpAM (CC-SpAM) to obtain graded relatedness judgments and derive a representation of developmental changes in the organization of semantic knowledge. In Experiment 2, we validated the findings of Experiment 1 by using a more traditional pairwise similarity judgment paradigm. Across both experiments, we found that an early recognition of links between entities that are both taxonomically and thematically related preceded an increasing recognition of links based on a single type of relation. The utility of CC-SpAM for evaluating theoretical accounts of semantic development is discussed.


Southern Medical Journal | 2015

An Investigation of the Variety and Complexity of Statistical Methods Used in Current Internal Medicine Literature.

Narayanan R; Rebecca Nugent; Kenneth Nugent

Objectives Accreditation Council for Graduate Medical Education guidelines require internal medicine residents to develop skills in the interpretation of medical literature and to understand the principles of research. A necessary component is the ability to understand the statistical methods used and their results, material that is not an in-depth focus of most medical school curricula and residency programs. Given the breadth and depth of the current medical literature and an increasing emphasis on complex, sophisticated statistical analyses, the statistical foundation and education necessary for residents are uncertain. Methods We reviewed the statistical methods and terms used in 49 articles discussed at the journal club in the Department of Internal Medicine residency program at Texas Tech University between January 1, 2013 and June 30, 2013. We collected information on the study type and on the statistical methods used for summarizing and comparing samples, determining the relations between independent variables and dependent variables, and estimating models. We then identified the typical statistics education level at which each term or method is learned. Results A total of 14 articles came from the Journal of the American Medical Association Internal Medicine, 11 from the New England Journal of Medicine, 6 from the Annals of Internal Medicine, 5 from the Journal of the American Medical Association, and 13 from other journals. Twenty reported randomized controlled trials. Summary statistics included mean values (39 articles), category counts (38), and medians (28). Group comparisons were based on t tests (14 articles), &khgr;2 tests (21), and nonparametric ranking tests (10). The relations between dependent and independent variables were analyzed with simple regression (6 articles), multivariate regression (11), and logistic regression (8). Nine studies reported odds ratios with 95% confidence intervals, and seven analyzed test performance using sensitivity and specificity calculations. These papers used 128 statistical terms and context-defined concepts, including some from data analysis (56), epidemiology-biostatistics (31), modeling (24), data collection (12), and meta-analysis (5). Ten different software programs were used in these articles. Based on usual undergraduate and graduate statistics curricula, 64.3% of the concepts and methods used in these papers required at least a master’s degree–level statistics education. Conclusions The interpretation of the current medical literature can require an extensive background in statistical methods at an education level exceeding the material and resources provided to most medical students and residents. Given the complexity and time pressure of medical education, these deficiencies will be hard to correct, but this project can serve as a basis for developing a curriculum in study design and statistical methods needed by physicians-in-training.


privacy in statistical databases | 2014

Hierarchical Linkage Clustering with Distributions of Distances for Large-Scale Record Linkage

Samuel L. Ventura; Rebecca Nugent; Erica R.H. Fuchs

Distance-based clustering techniques such as hierarchical clustering use a single estimate of distance for each pair of observations; their results then rely on the accuracy of this estimate. However, in many applications, datasets include measurement error or are too large for traditional models, meaning a single estimate of distance between two observations may be subject to error or computationally prohibitive to calculate. For example, in many of today’s large-scale record linkage problems, datasets are prohibitively large, making distance estimates computationally infeasible. By using a distribution of distance estimates instead (e.g. from an ensemble of classifiers trained on subsets of recordpairs), these issues may be resolved. We present a large-scale record linkage framework that incorporates classifier ensembles and “distribution linkage” clustering to identify clusters of records corresponding to unique entities. We examine the performance of several different distributional summary measures in hierarchical clustering. We motivate and illustrate this approach with an application of record linkage to the United States Patent and Trademark Office database.

Collaboration


Dive into the Rebecca Nugent's collaboration.

Top Co-Authors

Avatar

Nema Dean

University of Glasgow

View shared research outputs
Top Co-Authors

Avatar

Elizabeth Ayers

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Erica R.H. Fuchs

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Samuel L. Ventura

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Kenneth Nugent

Texas Tech University Health Sciences Center

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chia-Hsuan Yang

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Helen Wang

Texas Tech University Health Sciences Center

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Aarti Singh

Carnegie Mellon University

View shared research outputs
Researchain Logo
Decentralizing Knowledge