Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kjell Johnson is active.

Publication


Featured researches published by Kjell Johnson.


Journal of Biopharmaceutical Statistics | 2004

Evaluating methods for classifying expression data.

Michael Z. Man; Greg Dyson; Kjell Johnson; Birong Liao

Abstract An attractive application of expression technologies is to predict drug efficacy or safety using expression data of biomarkers. To evaluate the performance of various classification methods for building predictive models, we applied these methods on six expression datasets. These datasets were from studies using microarray technologies and had either two or more classes. From each of the original datasets, two subsets were generated to simulate two scenarios in biomarker applications. First, a 50-gene subset was used to simulate a candidate gene approach when it might not be practical to measure a large number of genes/biomarkers. Next, a 2000-gene subset was used to simulate a whole genome approach. We evaluated the relative performance of several classification methods by using leave-one-out cross-validation and bootstrap cross-validation. Although all methods perform well in both subsets for a relative easy dataset with two classes, differences in performance do exist among methods for other datasets. Overall, partial least squares discriminant analysis (PLS-DA) and support vector machines (SVM) outperform all other methods. We suggest a practical approach to take advantage of multiple methods in biomarker applications.


Journal of Chemical Information and Modeling | 2007

Identifying promising compounds in drug discovery: genetic algorithms and some new statistical techniques.

Abhyuday Mandal; Kjell Johnson; C. F. Jeff Wu; Dirk Bornemeier

Throughout the drug discovery process, discovery teams are compelled to use statistics for making decisions using data from a variety of inputs. For instance, teams are asked to prioritize compounds for subsequent stages of the drug discovery process, given results from multiple screens. To assist in the prioritization process, we propose a desirability function to account for a priori scientific knowledge; compounds can then be prioritized based on their desirability scores. In addition to identifying existing desirable compounds, teams often use prior knowledge to suggest new, potentially promising compounds to be created in the laboratory. Because the chemistry space to search can be dauntingly large, we propose the sequential elimination of level combinations (SELC) method for identifying new optimal compounds. We illustrate this method on a combinatorial chemistry example.


The Annals of Applied Statistics | 2009

On multi-view learning with additive models

Mark Culp; George Michailidis; Kjell Johnson

In many scientific settings data can be naturally partitioned into variable groupings called views. Common examples include environmental (1st view) and genetic information (2nd view) in ecological applications, chemical (1st view) and biological (2nd view) data in drug discovery. Multi-view data also occur in text analysis and proteomics applications where one view consists of a graph with observations as the vertices and a weighted measure of pairwise similarity between observations as the edges. Further, in several of these applications the observations can be partitioned into two sets, one where the response is observed (labeled) and the other where the response is not (unlabeled). The problem for simultaneously addressing viewed data and incorporating unlabeled observations in training is referred to as multi-view transductive learning. In this work we introduce and study a comprehensive generalized fixed point additive modeling framework for multi-view transductive learning, where any view is represented by a linear smoother. The problem of view selection is discussed using a generalized Akaike Information Criterion, which provides an approach for testing the contribution of each view. An efficient implementation is provided for fitting these models with both backfitting and local-scoring type algorithms adjusted to semi-supervised graph-based learning. The proposed technique is assessed on both synthetic and real data sets and is shown to be competitive to state-of-the-art co-training and graph-based techniques.


Journal of Pharmaceutical Sciences | 2009

The Development and Validation of a Computational Model to Predict Rat Liver Microsomal Clearance

Cheng Chang; David B. Duignan; Kjell Johnson; Pil Lee; George S Cowan; Eric Gifford; Charles Stankovic; Christopher Lepsy; Chad L. Stoner

As the cost of discovering and developing new pharmaceutically relevant compounds continues to rise, it is increasingly important to select the right molecules to prosecute very early in drug discovery. The development of high throughput in vitro assays of hepatic metabolic clearance has allowed for vast quantities of data generation; however, these large screens are still costly and remain dependant on animal usage. To further expand the value of these screens and ultimately aid in animal usage reduction, we have developed an in silico model of rat liver microsomal (RLM) clearance. This model combines a large amount of rat clearance data (n = 27,697) generated at multiple Pfizer laboratories to represent the broadest possible chemistry space. The model predicts RLM stability (with 82% accuracy and a kappa value of 0.65 for test data set) based solely on chemical structural inputs, and provides a clear assessment of confidence in the prediction. The current in silico model should help accelerate the drug discovery process by using confidence-based stability-driven prioritization, and reduce cost by filtering out the most unstable/undesirable molecules. The model can also increase efficiency in the evaluation of chemical series by optimizing iterative testing and promoting rational drug design.


Journal of Chemical Information and Modeling | 2010

The ensemble bridge algorithm: a new modeling tool for drug discovery problems

Mark Culp; Kjell Johnson; George Michailidis

Ensemble algorithms have been historically categorized into two separate paradigms, boosting and random forests, which differ significantly in the way each ensemble is constructed. Boosting algorithms represent one extreme, where an iterative greedy optimization strategy, weak learners (e.g., small classification trees), and stage weights are employed to target difficult-to-classify regions in the training space. On the other extreme, random forests rely on randomly selected features and complex learners (learners that exhibit low bias, e.g., large regression trees) to classify well over the entire training data. Because the approach is not targeting the next learner for inclusion, it tends to provide a natural robustness to noisy labels. In this work, we introduce the ensemble bridge algorithm, which is capable of transitioning between boosting and random forests using a regularization parameter nu in [0,1]. Because the ensemble bridge algorithm is a compromise between the greedy nature of boosting and the randomness present in random forests, it yields robust performance in the presence of a noisy response and superior performance in the presence of a clean response. Often, drug discovery data (e.g., computational chemistry data) have varying levels of noise. Hence, this method enables a practitioner to employ a single method to evaluate ensemble performance. The methods robustness is verified across a variety of data sets where the algorithm repeatedly yields better performance than either boosting or random forests alone. Finally, we provide diagnostic tools for the new algorithm, including a measure of variable importance and an observational clustering tool.


Birth Defects Research Part B-developmental and Reproductive Toxicology | 2011

Not a Walk in the Park: The ECVAM Whole Embryo Culture Model Challenged With Pharmaceuticals and Attempted Improvements With Random Forest Design

Jason J. Thomson; Kjell Johnson; Robert E. Chapin; Donald B. Stedman; Steven W. Kumpf; Terence R.S. Ozolinš

BACKGROUND The European Committee for the Validation of Alternative Methods (ECVAM) supported the development of a linear discriminant embryotoxicity prediction model founded on rat whole embryo culture (Piersma et al. (2004). Altern Lab Anim 32:275–307). Our goals were to (1) assess the accuracy of this model with pharmaceuticals, and (2) to use the data to develop a more accurate prediction model. METHODS Sixty-one chemicals of known in vivo activity were tested. They were part of the ECVAM validation set (N513), commercially available pharmaceuticals (N531), and Pfizer chemicals that did not reach the market, but for which developmental toxicity data were available (N517). They were tested according to the ECVAM procedures. Fifty-seven of these chemicals were used for Random Forest modeling to develop an alternate model with the goal of using surrogate endpoints for simplified assessments and to improve the predictivity of the model. RESULTS Using part of the ECVAM chemical test set, the ECVAM prediction model was 77% accurate. This approximated what was reported in the validation study (80%; Piersma et al. (2004). Altern Lab Anim 32:275–307). However, when confronted with novel chemicals, the accuracy of the linear discriminant model dropped to 56%. In an attempt to improve this performance, we used a Random Forest model that provided rankings and confidence estimates. Although the model used simpler endpoints, its performance was no better than the ECVAM linear discriminant model. CONCLUSIONS This study confirms previous concerns about the applicability of the ECVAM prediction model to a more diverse chemical set, and underscores the challenges associated with developing embryotoxicity prediction models.


Bioinformatics | 2012

Multinomial modeling and an evaluation of common data-mining algorithms for identifying signals of disproportionate reporting in pharmacovigilance databases

Kjell Johnson; Cen Guo; Mark Gosink; Vicky Wang; Manfred Hauben

MOTIVATION A principal objective of pharmacovigilance is to detect adverse drug reactions that are unknown or novel in terms of their clinical severity or frequency. One method is through inspection of spontaneous reporting system databases, which consist of millions of reports of patients experiencing adverse effects while taking one or more drugs. For such large databases, there is an increasing need for quantitative and automated screening tools to assist drug safety professionals in identifying drug-event combinations (DECs) worthy of further investigation. Existing algorithms can effectively identify problematic DECs when the frequencies are high. However these algorithms perform differently for low-frequency DECs. RESULTS In this work, we provide a method based on the multinomial distribution that identifies signals of disproportionate reporting, especially for low-frequency combinations. In addition, we comprehensively compare the performance of commonly used algorithms with the new approach. Simulation results demonstrate the advantages of the proposed method, and analysis of the Adverse Event Reporting System data shows that the proposed method can help detect interesting signals. Furthermore, we suggest that these methods be used to identify DECs that occur significantly less frequently than expected, thus identifying potential alternative indications for these drugs. We provide an empirical example that demonstrates the importance of exploring underexpected DECs. AVAILABILITY Code to implement the proposed method is available in R on request from the corresponding authors. CONTACT [email protected] or [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


Journal of Chemical Information and Modeling | 2011

Statistical Analysis and Compound Selection of Combinatorial Libraries for Soluble Epoxide Hydrolase

Li Xing; Robert Goulet; Kjell Johnson

Inhibitors of soluble epoxide hydrolase (sEH) have been extensively pursued as antihypertensive therapies as well as potential treatment for other cardiovascular dysfunctions and prevention of renal damage. In this study we report quantitative structure-activity relationship (QSAR) models for 1223 structurally diverse sEH inhibitors produced by combinatorial library design and synthesis. Daylight fingerprints, MOE 2D and DragonX descriptors were generated for QSAR modeling approaches. Using these descriptors, a number of statistical models were trained and validated. Of these methods, gradient boosting machines (GBM), partial least-squares (PLS), and Cubist methods demonstrated the best performance on training and test set validation in terms of their leave-group-out cross-validated (LGO-CV) Q(2) and correlation coefficient R(2) (Q(2)(GBM-training) = 0.79, R(2)(GBM-test) = 0.81; Q(2)(PLS-training) = 0.75, R(2)(PLS-test) = 0.75; Q(2)(Cubist-training) = 0.91, R(2)(Cubist-test) = 0.78). A final model was constructed using the consensus approach of the three individual models and showed robust statistics and prediction of the external validation set. The Gaussian process modified sequential elimination of level combinations (G-SELC) method was then used to expand the chemical space beyond what has been explored by combinatorial synthesis. This approach identified 50 new compounds that are structurally diverse and potentially desirable for sEH inhibition based on prior knowledge. The activities of the suggested compounds were then predicted by the consensus QSAR model, and the results supported that the compounds were more likely to exist in the active parts of the chemical space. This study illustrates that the balanced approach by G-SELC could provide a general method for combinatorial library design, to effectively identify promising compounds to be created in the laboratory.


Birth Defects Research Part B-developmental and Reproductive Toxicology | 2013

The inhibin B response in male rats treated with two drug candidates.

Robert E. Chapin; James D. Alvey; Richard Goldstein; Melba G. Dokmanovich; William J. Reagan; Kjell Johnson; Frank J. Geoly

BACKGROUND Serum Inhibin B was measured in two studies of known testis-toxic drug candidates. METHODS AND RESULTS Study 1 was for a compound for Hepatitis C, and utilized a 10-week dosing period, followed by mating and necropsy of half of each group, and then a 12-week recovery period for the remaining animals. At the postmating necropsy, 6 of 15 high-dose males had testis lesions; Inhibin B was significantly reduced in all animals in that group. The mid-dose group had no lesions but significantly reduced serum Inhibin B. At recovery, 9 of 15 high-dose males showed damage in testes; serum Inhibin B levels were not different from controls. Inhibin B appeared to both overreport and underreport testis damage in Study 1. Study 2 was an acute pathogenesis study for an antibacterial compound, using control and two dose levels and multiple time points (days 5, 8, 15, 22, and then untreated until day 71). At each time point blood was sampled from all remaining rats and five/group were killed for histologic evaluation. The low-dose group had minimal to moderate lesions, while serum Inhibin B was never changed. The high-dose animals progressed quickly from minimal lesions to being broadly and moderately affected; serum Inhibin B levels were reduced at days 8 and 15 only. In Study 2, Inhibin B appeared less sensitive than histology, except at the extremes of testis damage, when Inhibin B was routinely low. CONCLUSION We conclude that in these two studies there was a poor correlation between changes in serum levels of Inhibin B and testis histopathology.


Journal of Computational and Graphical Statistics | 2011

On Adaptive Regularization Methods in Boosting

Mark Culp; George Michailidis; Kjell Johnson

Boosting algorithms build models on dictionaries of learners constructed from the data, where a coefficient in this model relates to the contribution of a particular learner relative to the other learners in the dictionary. Regularization for these models is currently implemented by iteratively applying a simple local tolerance parameter, which scales each coefficient toward zero. Stochastic enhancements, such as bootstrapping, incorporate a random mechanism in the construction of the ensemble to improve robustness, reduce computation time, and improve accuracy. In this article, we propose a novel local estimation scheme for direct data-driven estimation of regularization parameters in boosting algorithms with stochastic enhancements based on a penalized loss optimization framework. In addition, k-fold cross-validated estimates of this penalty are obtained during its construction. This leads to a computationally fast and effective way of estimating this parameter for boosting algorithms with stochastic enhancements. The procedure is illustrated on both real and synthetic data. The R code used in this manuscript is available as supplemental material.

Collaboration


Dive into the Kjell Johnson's collaboration.

Top Co-Authors

Avatar

Mark Culp

West Virginia University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Abhyuday Mandal

Georgia Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge