Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kimberly F. Sellers is active.

Publication


Featured researches published by Kimberly F. Sellers.


Quality and Reliability Engineering International | 2012

A generalized statistical control chart for over‐ or under‐dispersed data

Kimberly F. Sellers

The Poisson distribution is a popular distribution used to describe count information, from which control charts involving count data have been established. Several works recognize the need for a generalized control chart to allow for data over-dispersion; however, analogous arguments can also be made to account for potential under-dispersion. The Conway–Maxwell–Poisson (COM-Poisson) distribution is a general count distribution that relaxes the equi-dispersion assumption of the Poisson distribution, and in fact encompasses the special cases of the Poisson, geometric, and Bernoulli distributions. Accordingly, a flexible control chart is developed that encompasses the classical Shewart charts based on the Poisson, Bernoulli (or binomial), and geometric (or negative binomial) distributions. Copyright


Journal of Multivariate Analysis | 2016

Bivariate Conway-Maxwell-Poisson distribution

Kimberly F. Sellers; Darcy Steeg Morris; N. Balakrishnan

The bivariate Poisson distribution is a popular distribution for modeling bivariate count data. Its basic assumptions and marginal equi-dispersion, however, may prove limiting in some contexts. To allow for data dispersion, we develop here a bivariate Conway-Maxwell-Poisson (COM-Poisson) distribution that includes the bivariate Poisson, bivariate Bernoulli, and bivariate geometric distributions all as special cases. As a result, the bivariate COM-Poisson distribution serves as a flexible alternative and unifying framework for modeling bivariate count data, especially in the presence of data dispersion.


Proteome Science | 2010

A comparison of imputation procedures and statistical tests for the analysis of two-dimensional electrophoresis data

Jeffrey C. Miecznikowski; Senthilkumar Damodaran; Kimberly F. Sellers; Richard A. Rabin

Numerous gel-based softwares exist to detect protein changes potentially associated with disease. The data, however, are abundant with technical and structural complexities, making statistical analysis a difficult task. A particularly important topic is how the various softwares handle missing data. To date, no one has extensively studied the impact that interpolating missing data has on subsequent analysis of protein spots. This work highlights the existing algorithms for handling missing data in two-dimensional gel analysis and performs a thorough comparison of the various algorithms and statistical tests on simulated and real datasets. For imputation methods, the best results in terms of root mean squared error are obtained using the least squares method of imputation along with the expectation maximization (EM) algorithm approach to estimate missing values with an array covariance structure. The bootstrapped versions of the statistical tests offer the most liberal option for determining protein spot significance while the generalized family wise error rate (gFWER) should be considered for controlling the multiple testing error. In summary, we advocate for a three-step statistical analysis of two-dimensional gel electrophoresis (2-DE) data with a data imputation step, choice of statistical test, and lastly an error control method in light of multiple testing. When determining the choice of statistical test, it is worth considering whether the protein spots will be subjected to mass spectrometry. If this is the case a more liberal test such as the percentile-based bootstrap t can be employed. For error control in electrophoresis experiments, we advocate that gFWER be controlled for multiple testing rather than the false discovery rate.BackgroundNumerous gel-based softwares exist to detect protein changes potentially associated with disease. The data, however, are abundant with technical and structural complexities, making statistical analysis a difficult task. A particularly important topic is how the various softwares handle missing data. To date, no one has extensively studied the impact that interpolating missing data has on subsequent analysis of protein spots.ResultsThis work highlights the existing algorithms for handling missing data in two-dimensional gel analysis and performs a thorough comparison of the various algorithms and statistical tests on simulated and real datasets. For imputation methods, the best results in terms of root mean squared error are obtained using the least squares method of imputation along with the expectation maximization (EM) algorithm approach to estimate missing values with an array covariance structure. The bootstrapped versions of the statistical tests offer the most liberal option for determining protein spot significance while the generalized family wise error rate (gFWER) should be considered for controlling the multiple testing error.ConclusionsIn summary, we advocate for a three-step statistical analysis of two-dimensional gel electrophoresis (2-DE) data with a data imputation step, choice of statistical test, and lastly an error control method in light of multiple testing. When determining the choice of statistical test, it is worth considering whether the protein spots will be subjected to mass spectrometry. If this is the case a more liberal test such as the percentile-based bootstrap t can be employed. For error control in electrophoresis experiments, we advocate that gFWER be controlled for multiple testing rather than the false discovery rate.


Computational Statistics & Data Analysis | 2016

A flexible zero-inflated model to address data dispersion

Kimberly F. Sellers; Andrew M. Raim

Excess zeroes are often thought of as a cause of data over-dispersion (i.e. when the variance exceeds the mean); this claim is not entirely accurate. In actuality, excess zeroes reduce the mean of a dataset, thus inflating the dispersion index (i.e. the variance divided by the mean). While this results in an increased chance for data over-dispersion, the implication is not guaranteed. Thus, one should consider a flexible distribution that not only can account for excess zeroes, but can also address potential over- or under-dispersion. A zero-inflated Conway-Maxwell-Poisson (ZICMP) regression allows for modeling the relationship between explanatory and response variables, while capturing the effects due to excess zeroes and dispersion. This work derives the ZICMP model and illustrates its flexibility, extrapolates the corresponding likelihood ratio test for the presence of significant data dispersion, and highlights various statistical properties and model fit through several examples. Zero-inflated Conway-Maxwell-Poisson models dispersed datasets with excess zeroes.Hypothesis test detects statistically significant dispersion in light of excess zeroes.Data simulations and examples illustrate flexibility in model fit.


The American Statistician | 2017

Bridging the Gap: A Generalized Stochastic Process for Count Data

Li Zhu; Kimberly F. Sellers; Darcy Steeg Morris; Galit Shmueli

ABSTRACT The Bernoulli and Poisson processes are two popular discrete count processes; however, both rely on strict assumptions. We instead propose a generalized homogenous count process (which we name the Conway–Maxwell–Poisson or COM-Poisson process) that not only includes the Bernoulli and Poisson processes as special cases, but also serves as a flexible mechanism to describe count processes that approximate data with over- or under-dispersion. We introduce the process and an associated generalized waiting time distribution with several real-data applications to illustrate its flexibility for a variety of data structures. We consider model estimation under different scenarios of data availability, and assess performance through simulated and real datasets. This new generalized process will enable analysts to better model count processes where data dispersion exists in a more accommodating and flexible manner.


Molecular Vision | 2008

Characterization of gene expression profiles of normal canine retina and brain using a retinal cDNA microarray

Gerardo L. Paez; Barbara Zangerl; Kimberly F. Sellers; Gregory M. Acland; Gustavo D. Aguirre

PURPOSE Construction of a canine retinal custom cDNA microarray for comprehensive retinal gene expression profiling and application for the identification of genes that are preferentially expressed in the retina and brain lobes using a brain pool reference tissue. METHODS A cDNA microarray was constructed utilizing clones obtained from a normalized canine retinal expressed sequence tag library. Gene expression profiles were analyzed for normal retina, as well as the cortex of the frontal, occipital, and temporal brain regions. Each sample was studied against a reference sample of pooled brain RNA. Data from a quantified scanned image were normalized using the loess subgrid procedure. Retina-enriched genes were identified using the Significance Analysis of Microarrays (SAM) algorithm, and confirmed by northern blot analyses for selected genes. Differences between biological samples were displayed using principal component analysis (PCA). RESULTS Expression profiles for each tissue set were analyzed against the common reference of pooled brain. Changes in expression between the sample and the reference were higher in the retina (27.9%) than the individual brain tissues (2-6.6%). Furthermore, all individual retinal samples were clearly separated from any of the hybridizations using brain tissue in the PCA. The accuracy of observed changes in expression has been confirmed by northern blot analysis using five randomly chosen genes that represented a wide range of different expression levels between retina and brain. CONCLUSIONS We have established an accurate and robust microarray system suitable for the investigation of expression patterns in the retina and brain. Characterization of the gene expression profiles in normal retina will facilitate the understanding of the processes that underline differences between normal and diseased retinas.


International Journal of Biomedical Imaging | 2010

Feature detection techniques for preprocessing proteomic data

Kimberly F. Sellers; Jeffrey C. Miecznikowski

Numerous gel-based and nongel-based technologies are used to detect protein changes potentially associated with disease. The raw data, however, are abundant with technical and structural complexities, making statistical analysis a difficult task. Low-level analysis issues (including normalization, background correction, gel and/or spectral alignment, feature detection, and image registration) are substantial problems that need to be addressed, because any large-level data analyses are contingent on appropriate and statistically sound low-level procedures. Feature detection approaches are particularly interesting due to the increased computational speed associated with subsequent calculations. Such summary data corresponding to image features provide a significant reduction in overall data size and structure while retaining key information. In this paper, we focus on recent advances in feature detection as a tool for preprocessing proteomic data. This work highlights existing and newly developed feature detection algorithms for proteomic datasets, particularly relating to time-of-flight mass spectrometry, and two-dimensional gel electrophoresis. Note, however, that the associated data structures (i.e., spectral data, and images containing spots) used as input for these methods are obtained via all gel-based and nongel-based methods discussed in this manuscript, and thus the discussed methods are likewise applicable.


Communications in Statistics-theory and Methods | 2017

Underdispersion models: Models that are “under the radar”

Kimberly F. Sellers; Darcy Steeg Morris

ABSTRACT The Poisson distribution is a benchmark for modeling count data. Its equidispersion constraint, however, does not accurately represent real data. Most real datasets express overdispersion; hence attention in the statistics community focuses on associated issues. More examples are surfacing, however, that display underdispersion, warranting the need to highlight this phenomenon and bring more attention to those models that can better describe such data structures. This work addresses various sources of data underdispersion and surveys several distributions that can model underdispersed data, comparing their performance on applied datasets.


Frontiers in Public Health | 2014

Race matters: analyzing the relationship between colorectal cancer mortality rates and various factors within respective racial groups

Emma Veach; Ismael Xique; Jada Johnson; Jessica Lyle; Israel Almodovar; Kimberly F. Sellers; Calandra T. Moore; Monica C. Jackson

Colorectal cancer (CRC) is the third leading cause of mortality due to cancer (with over 50,000 deaths annually), representing 9% of all cancer deaths in the United States (1). In particular, the African-American CRC mortality rate is among the highest reported for any race/ethnic group. Meanwhile, the CRC mortality rate for Hispanics is 15–19% lower than that for non-Hispanic Caucasians (2). While factors such as obesity, age, and socio-economic status are known to associate with CRC mortality, do these and other potential factors correlate with CRC death in the same way across races? This research linked CRC mortality data obtained from the National Cancer Institute with data from the United States Census Bureau, the Centers for Disease Control and Prevention, and the National Solar Radiation Database to examine geographic and racial/ethnic differences, and develop a spatial regression model that adjusted for several factors that may attribute to health disparities among ethnic/racial groups. This analysis showed that sunlight, obesity, and socio-economic status were significant predictors of CRC mortality. The study is significant because it not only verifies known factors associated with the risk of CRC death but, more importantly, demonstrates how these factors vary within different racial groups. Accordingly, education on reducing risk factors for CRC should be directed at specific racial groups above and beyond creating a generalized education plan.


Archive | 2012

Statistical Analysis of Gel Electrophoresis Data

Kimberly F. Sellers; Jeffrey C. Miecznikowski

Two-dimensional gel electrophoresis (2-DE) methods such as two-dimensional polyacrylamide gel electrophoresis (2D-PAGE; O’Farrell (1975)) and two-dimensional difference gel electrophoresis (2D-DIGE; Unlu et al. (1997)) are popular techniques for protein separation because they allow researchers to characterize quantitative protein changes on a large scale. Thus, 2-DE is frequently used as an initial screening procedure whereby results obtained generate scientific ideas for study. These technologies revolutionized the field of proteomics and biomarker discovery in their ability to detect protein changes either in differential expression or modification (Huang et al., 2006; Rai & Chan, 2004; Wulfkuhle et al., 2003; Zhou et al., 2002). Further, they are attractive because of their resolving power and sensitivity. 2-DE analyses, however, require personnel with significant wet laboratory expertise and can be time-consuming, thus potentially limiting the sample size for gels.

Collaboration


Dive into the Kimberly F. Sellers's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Galit Shmueli

National Tsing Hua University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Darcy Steeg Morris

United States Census Bureau

View shared research outputs
Top Co-Authors

Avatar

Andrew W. Swift

University of Nebraska Omaha

View shared research outputs
Top Co-Authors

Avatar

Jane M. Booker

Los Alamos National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kimberly S. Weems

North Carolina Central University

View shared research outputs
Top Co-Authors

Avatar

Li Zhu

Georgetown University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge