Kofi P. Adragni
University of Maryland, Baltimore County
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kofi P. Adragni.
Philosophical Transactions of the Royal Society A | 2009
Kofi P. Adragni; R. Dennis Cook
Dimension reduction for regression is a prominent issue today because technological advances now allow scientists to routinely formulate regressions in which the number of predictors is considerably larger than in the past. While several methods have been proposed to deal with such regressions, principal components (PCs) still seem to be the most widely used across the applied sciences. We give a broad overview of ideas underlying a particular class of methods for dimension reduction that includes PCs, along with an introduction to the corresponding methodology. New methods are proposed for prediction in regressions with many predictors.
Archive | 2011
Jeremy Bejarano; Koushiki Bose; Tyler Brannan; Anita Thomas; Kofi P. Adragni; Nagaraj K. Neerchal; George Ostrouchov
Due to current data collection technology, our ability to gather data has surpassed our ability to analyze it. In particular, k-means, one of the simplest and fastest clustering algorithms, is ill-equipped to handle extremely large datasets on even the most powerful machines. Our new algorithm uses a sample from a dataset to decrease runtime by reducing the amount of data analyzed. We perform a simulation study to compare our sampling based k-means to the standard k-means algorithm by analyzing both the speed and accuracy of the two methods. Results show that our algorithm is significantly more ecient than the existing algorithm with comparable accuracy.
Journal of Applied Statistics | 2015
Kofi P. Adragni
We present a methodology for screening predictors that, given the response, follow a one-parameter exponential family distributions. Screening predictors can be an important step in regressions when the number of predictors p is excessively large or larger than n the number of observations. We consider instances where a large number of predictors are suspected irrelevant for having no information about the response. The proposed methodology helps remove these irrelevant predictors while capturing those linearly or nonlinearly related to the response.
Journal of Statistical Computation and Simulation | 2018
Kofi P. Adragni
ABSTRACT Sufficient dimension reduction methods aim to reduce the dimensionality of predictors while preserving regression information relevant to the response. In this article, we develop Minimum Average Deviance Estimation (MADE) methodology for sufficient dimension reduction. The purpose of MADE is to generalize Minimum Average Variance Estimation (MAVE) beyond its assumption of additive errors to settings where the outcome follows an exponential family distribution. As in MAVE, a local likelihood approach is used to learn the form of the regression function from the data and the main parameter of interest is a dimension reduction subspace. To estimate this parameter within its natural space, we propose an iterative algorithm where one step utilizes optimization on the Stiefel manifold. MAVE is seen to be a special case of MADE in the case of Gaussian outcomes with a common variance. Several procedures are considered to estimate the reduced dimension and to predict the outcome for an arbitrary covariate value. Initial simulations and data analysis examples yield encouraging results and invite further exploration of the methodology.
Computational Statistics & Data Analysis | 2017
Elias Al-Najjar; Kofi P. Adragni
Most methodologies for sufficient dimension reduction (SDR) in regression are limited to continuous predictors, although many data sets do contain both continuous and categorical variables. Application of these methods to regressions that include qualitative predictors such as gender or species may be inappropriate. Regressions that include a set of qualitative predictors W in addition to a vector X of many-valued predictors and a response Y are considered. Using principal fitted components (PFC) models, a likelihood-based SDR method, a sufficient dimension reduction of X that is constrained through the sub-populations established by W is sought. An estimator of the sufficient reduction subspace is provided and its use is demonstrated through applications.
Statistics | 2015
Kofi P. Adragni; Mingyu Xi
Principal fitted component (PFC) models are a class of likelihood-based inverse regression methods that yield a so-called sufficient reduction of the random p-vector of predictors X given the response Y. Assuming that a large number of the predictors has no information about Y, we aimed to obtain an estimate of the sufficient reduction that ‘purges’ these irrelevant predictors, and thus, select the most useful ones. We devised a procedure using observed significance values from the univariate fittings to yield a sparse PFC, a purged estimate of the sufficient reduction. The performance of the method is compared to that of penalized forward linear regression models for variable selection in high-dimensional settings.
Computational Statistics & Data Analysis | 2015
Kofi P. Adragni; Moumita Karmakar
Given a high dimensional p-vector of continuous predictors X and a univariate response Y, principal fitted components (PFC) provide a sufficient reduction of X that retains all regression information about Y in X while reducing the dimensionality. The reduction is a set of linear combinations of all the p predictors, where with the use of a flexible set of basis functions, predictors related to Y via complex, nonlinear relationship can be detected. In the presence of possibly large number of irrelevant predictors, the accuracy of the sufficient reduction is hindered. The proposed method adapts a sequential test to the PFC to obtain a “pruned” sufficient reduction that shed off the irrelevant predictors. The sequential test is based on the likelihood ratio which expression is derived under different covariance structures of X|Y. The resulting reduction has an improved accuracy and also allows the identification of the relevant variables.
Archive | 2013
Matthew G. Bachmann; Ashley D. Dyas; Shelby C. Kilmer; Julian Sass; Andrew M. Raim; Nagaraj K. Neerchal; Kofi P. Adragni; George Ostrouchov; Ian F. Thorpe
Programming with big data in R (pbdR), a package used to implement high-performance computing in the statistical software R, uses block cyclic distribution to organize large data across many processes. Because computations performed on large matrices are often not associative, a systematic approach must be used during parallelization to divide the matrix correctly. The block cyclic distribution method stresses a balanced load across processes by allocating sections of data to a corresponding node. This method achieves well divided data that each process computes individually and calculates a final result more efficiently. A nontrivial problem occurs when using block cyclic distribution: Which combinations of different block sizes and grid layouts are most effective? These two factors greatly influence computational efficiency, and therefore it is crucial to study and understand their relationship. To analyze the effects of block size and processor grid layout, we carry out a performance study of the block cyclic process used to compute a principal components analysis (PCA). We apply PCA both to a large simulated data set and to data involving the analysis of single nucleotide polymorphisms (SNPs). We implement analysis of variance (ANOVA) techniques in order to distinguish the variability associated with each grid layout and block distribution. Once the nature of these factors is determined, predictions about the performance for much larger data sets can be made. Our final results demonstrate the relationship between computational efficiency and both block distribution and processor grid layout, and establish a benchmark regarding which combinations of these factors are most effective.
Archive | 2013
William J. Bailey; Claire A. Chambless; Brandynne M. Cho; Jesse D. Smith; Andrew M. Raim; Kofi P. Adragni; Ian F. Thorpe
Complex biomolecules such as proteins can respond to changes in their environment through a process called allostery, which plays an important role in regulating the function of these biomolecules. Allostery occurs when an event at a specific location in a macromolecule produces an effect at a location in the molecule some distance away. An important component of allostery is the coupling of protein sites. Such coupling is one mechanism by which allosteric effects can be transmitted over long distances. To understand this phenomenon, molecular dynamic simulations are carried out with a large number of atoms, and the trajectories of these atoms are recorded over time. Simple correlation methods have been used in the literature to identify coupled motions between protein sites. We implement a recently developed statistical method for dimension reduction called principal fitted components (PFC) in the statistical programming language R to identify both linear and non-linear correlations between protein sites while dealing efficiently with the high dimensionality of the data. PFC models reduce the dimensionality of data while capturing linear and nonlinear dependencies among predictors (atoms) using a flexible set of basis functions. For faster processing, we implement the PFC algorithm using parallel computing through the Programming with Big Data in R (pbdR) package for R. We demonstrate the methods’ effectiveness on simulated datasets, and apply the routine to time series data from Molecular Dynamic (MD) simulations to identify coupled motion among the atoms.
Journal of Statistical Software | 2012
Kofi P. Adragni; R. Dennis Cook; Seongho Wu