Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where James J. Chen is active.

Publication


Featured researches published by James J. Chen.


BMC Bioinformatics | 2005

Cross-platform comparability of microarray technology: Intra-platform consistency and appropriate data analysis procedures are essential

Leming Shi; Weida Tong; Hong Fang; Uwe Scherf; Jing Han; Raj K. Puri; Felix W. Frueh; Federico Goodsaid; Lei Guo; Zhenqiang Su; Tao Han; James C. Fuscoe; Z aAlex Xu; Tucker A. Patterson; Huixiao Hong; Qian Xie; Roger Perkins; James J. Chen; Daniel A. Casciano

BackgroundThe acceptance of microarray technology in regulatory decision-making is being challenged by the existence of various platforms and data analysis methods. A recent report (E. Marshall, Science, 306, 630–631, 2004), by extensively citing the study of Tan et al. (Nucleic Acids Res., 31, 5676–5684, 2003), portrays a disturbingly negative picture of the cross-platform comparability, and, hence, the reliability of microarray technology.ResultsWe reanalyzed Tans dataset and found that the intra-platform consistency was low, indicating a problem in experimental procedures from which the dataset was generated. Furthermore, by using three gene selection methods (i.e., p-value ranking, fold-change ranking, and Significance Analysis of Microarrays (SAM)) on the same dataset we found that p-value ranking (the method emphasized by Tan et al.) results in much lower cross-platform concordance compared to fold-change ranking or SAM. Therefore, the low cross-platform concordance reported in Tans study appears to be mainly due to a combination of low intra-platform consistency and a poor choice of data analysis procedures, instead of inherent technical differences among different platforms, as suggested by Tan et al. and Marshall.ConclusionOur results illustrate the importance of establishing calibrated RNA samples and reference datasets to objectively assess the performance of different microarray platforms and the proficiency of individual laboratories as well as the merits of various data analysis procedures. Thus, we are progressively coordinating the MAQC project, a community-wide effort for microarray quality control.


Briefings in Bioinformatics | 2013

Class-imbalanced classifiers for high-dimensional data

Wei-Jiun Lin; James J. Chen

A class-imbalanced classifier is a decision rule to predict the class membership of new samples from an available data set where the class sizes differ considerably. When the class sizes are very different, most standard classification algorithms may favor the larger (majority) class resulting in poor accuracy in the minority class prediction. A class-imbalanced classifier typically modifies a standard classifier by a correction strategy or by incorporating a new strategy in the training phase to account for differential class sizes. This article reviews and evaluates some most important methods for class prediction of high-dimensional imbalanced data. The evaluation addresses the fundamental issues of the class-imbalanced classification problem: imbalance ratio, small disjuncts and overlap complexity, lack of data and feature selection. Four class-imbalanced classifiers are considered. The four classifiers include three standard classification algorithms each coupled with an ensemble correction strategy and one support vector machines (SVM)-based correction classifier. The three algorithms are (i) diagonal linear discriminant analysis (DLDA), (ii) random forests (RFs) and (ii) SVMs. The SVM-based correction classifier is SVM threshold adjustment (SVM-THR). A Monte-Carlo simulation and five genomic data sets were used to illustrate the analysis and address the issues. The SVM-ensemble classifier appears to perform the best when the class imbalance is not too severe. The SVM-THR performs well if the imbalance is severe and predictors are highly correlated. The DLDA with a feature selection can perform well without using the ensemble correction.


BMC Bioinformatics | 2005

Microarray scanner calibration curves: characteristics and implications

Leming Shi; Weida Tong; Zhenqiang Su; Tao Han; Jing Han; Raj K. Puri; Hong Fang; Felix W. Frueh; Federico Goodsaid; Lei Guo; William S. Branham; James J. Chen; Z Alex Xu; Stephen Harris; Huixiao Hong; Qian Xie; Roger Perkins; James C. Fuscoe

BackgroundMicroarray-based measurement of mRNA abundance assumes a linear relationship between the fluorescence intensity and the dye concentration. In reality, however, the calibration curve can be nonlinear.ResultsBy scanning a microarray scanner calibration slide containing known concentrations of fluorescent dyes under 18 PMT gains, we were able to evaluate the differences in calibration characteristics of Cy5 and Cy3. First, the calibration curve for the same dye under the same PMT gain is nonlinear at both the high and low intensity ends. Second, the degree of nonlinearity of the calibration curve depends on the PMT gain. Third, the two PMTs (for Cy5 and Cy3) behave differently even under the same gain. Fourth, the background intensity for the Cy3 channel is higher than that for the Cy5 channel. The impact of such characteristics on the accuracy and reproducibility of measured mRNA abundance and the calculated ratios was demonstrated. Combined with simulation results, we provided explanations to the existence of ratio underestimation, intensity-dependence of ratio bias, and anti-correlation of ratios in dye-swap replicates. We further demonstrated that although Lowess normalization effectively eliminates the intensity-dependence of ratio bias, the systematic deviation from true ratios largely remained. A method of calculating ratios based on concentrations estimated from the calibration curves was proposed for correcting ratio bias.ConclusionIt is preferable to scan microarray slides at fixed, optimal gain settings under which the linearity between concentration and intensity is maximized. Although normalization methods improve reproducibility of microarray measurements, they appear less effective in improving accuracy.


Journal of the American Statistical Association | 1989

Quantitative risk assessment for teratological effects

James J. Chen; Ralph L. Kodell

Abstract This article presents a quantitative procedure for using a “benchmark dose” to obtain low-dose risk estimates for reproductive and developmental toxic effects. This procedure combines the best features of the previously proposed methods for handling litter effects for teratology data and the currently used methods for quantitative risk assessment. The beta-binomial distribution is used to account for litter effects, and the Weibull dose—response model is used for modeling teratogenic effects. A benchmark dose, defined to be the lowest dose at which the excess risk does not exceed 1% with 95% confidence, is proposed to replace the no-observed-effect level (NOEL). The NOEL is generally the highest experimental dose that is not statistically different from the control; the NOEL approach does not use experimental data effectively for quantitative risk estimation. In this article, a lower limit on the safe dose is estimated by linearly extrapolating downward from the benchmark dose; this procedure is ...


Pharmacogenomics Journal | 2007

Selection of differentially expressed genes in microarray data analysis

James J. Chen; Wang Sj; Tsai Ca; Lin Cj

One common objective in microarray experiments is to identify a subset of genes that express differentially among different experimental conditions, for example, between drug treatment and no drug treatment. Often, the goal is to determine the underlying relationship between poor versus good gene signatures for identifying biological functions or predicting specific therapeutic outcomes. Because of the complexity in studying hundreds or thousands of genes in an experiment, selection of a subset of genes to enhance relationships among the underlying biological structures or to improve prediction accuracy of clinical outcomes has been an important issue in microarray data analysis. Selection of differentially expressed genes is a two-step process. The first step is to select an appropriate test statistic and compute the P-value. The genes are ranked according to their P-values as evidence of differential expression. The second step is to assign a significance level, that is, to determine a cutoff threshold from the P-values in accordance with the study objective. In this paper, we consider four commonly used statistics, t-, S- (SAM), U-(Mann–Whitney) and M-statistics to compute the P-values for gene ranking. We consider the family-wise error and false discovery rate false-positive error-controlled procedures to select a limited number of genes, and a receiver-operating characteristic (ROC) approach to select a larger number of genes for assigning the significance level. The ROC approach is particularly useful in genomic/genetic profiling studies. The well-known colon cancer data containing 22 normal and 40 tumor tissues are used to illustrate different gene ranking and significance level assignment methods for applications to genomic/genetic profiling studies. The P-values computed from the t-, U- and M-statistics are very similar. We discuss the common practice that uses the P-value, false-positive error probability, as the primary criterion, and then uses the fold-change as a surrogate measure of biological significance for gene selection. The P-value and the fold-change can be pictorially shown simultaneously in a volcano plot. We also address several issues on gene selection.


Biometrics | 1991

Analysis of trinomial responses from reproductive and developmental toxicity experiments

James J. Chen; Ralph L. Kodell; Richard B. Howe; David W. Gaylor

This paper presents a Dirichlet-trinomial distribution for modelling data obtained from reproductive and developmental studies. The common endpoints for the evaluation of reproductive and developmental toxic effects are the number of dead fetuses, the number of malformed fetuses, and the number of normal fetuses for each litter. With current statistical methods for the evaluation of reproductive and developmental effects, the effect on the number of deaths and the effect on the number of malformations are analyzed separately. The Dirichlet-trinomial model provides a procedure for the analysis of multiple endpoints simultaneously. This proposed Dirichlet-trinomial model is a generalization of the beta-binomial model that has been used for handling the litter effect in reproductive and developmental experiments. Likelihood ratio tests for differences in the number of deaths, the number of malformations, and the number of normals among dosed and control groups are derived. The proposed test procedure based on the Dirichlet-trinomial model is compared with that based on the beta-binomial model with an application to a real data set.


Toxicologic Pathology | 2009

The Liver Toxicity Biomarker Study: Phase I Design and Preliminary Results

Robert N. McBurney; Wade M. Hines; Linda S. Von Tungeln; Laura K. Schnackenberg; Richard D. Beger; Carrie L. Moland; Tao Han; James C. Fuscoe; Ching-Wei Chang; James J. Chen; Zhenqiang Su; Xiaohui Fan; Weida Tong; Shelagh A. Booth; Raji Balasubramanian; Paul Courchesne; Jennifer M. Campbell; Armin Graber; Yu Guo; Peter Juhasz; Tricin Y. Li; Moira Lynch; Nicole Morel; Thomas N. Plasterer; Edward J. Takach; Chenhui Zeng; Frederick A. Beland

Drug-induced liver injury (DILI) is the primary adverse event that results in withdrawal of drugs from the market and a frequent reason for the failure of drug candidates in development. The Liver Toxicity Biomarker Study (LTBS) is an innovative approach to investigate DILI because it compares molecular events produced in vivo by compound pairs that (a) are similar in structure and mechanism of action, (b) are associated with few or no signs of liver toxicity in preclinical studies, and (c) show marked differences in hepatotoxic potential. The LTBS is a collaborative preclinical research effort in molecular systems toxicology between the National Center for Toxicological Research and BG Medicine, Inc., and is supported by seven pharmaceutical companies and three technology providers. In phase I of the LTBS, entacapone and tolcapone were studied in rats to provide results and information that will form the foundation for the design and implementation of phase II. Molecular analysis of the rat liver and plasma samples combined with statistical analyses of the resulting datasets yielded marker analytes, illustrating the value of the broad-spectrum, molecular systems analysis approach to studying pharmacological or toxicological effects.


Journal of Clinical Microbiology | 2010

Evaluation of Pulsed-Field Gel Electrophoresis Profiles for Identification of Salmonella Serotypes

Wen Zou; Wei-Jiun Lin; Steven L. Foley; Chun-Houh Chen; James J. Chen

ABSTRACT Pulsed-field gel electrophoresis (PFGE) is a standard typing method for isolates from Salmonella outbreaks and epidemiological investigations. Eight hundred sixty-six Salmonella enterica isolates from eight serotypes, including Heidelberg (n = 323), Javiana (n = 200), Typhimurium (n = 163), Newport (n = 93), Enteritidis (n = 45), Dublin (n = 25), Pullorum (n = 9), and Choleraesuis (n = 8), were subjected to PFGE, and their profiles were analyzed by random forest classification and compared to conventional hierarchical cluster analysis to determine potential predictive relationships between PFGE banding patterns and particular serotypes. Cluster analysis displayed only the underlying similarities and relationships of the isolates from the eight serotypes. However, for serotype prediction of a nonserotyped Salmonella isolate from its PFGE pattern, random forest classification provided better accuracy than conventional cluster analysis. Discriminatory DNA band class markers were identified for distinguishing Salmonella serotype Heidelberg, Javiana, Typhimurium, and Newport isolates.


Journal of Agricultural and Food Chemistry | 2008

Using Dietary Exposure and Physiologically Based Pharmacokinetic/Pharmacodynamic Modeling in Human Risk Extrapolations for Acrylamide Toxicity

Daniel R. Doerge; John F. Young; James J. Chen; Michael Dinovi; Sara H. Henry

The discovery of acrylamide (AA) in many common cooked starchy foods has presented significant challenges to toxicologists, food scientists, and national regulatory and public health organizations because of the potential for producing neurotoxicity and cancer. This paper reviews some of the underlying experimental bases for AA toxicity and earlier risk assessments. Then, dietary exposure modeling is used to estimate probable AA intake in the U.S. population, and physiologically based pharmacokinetic/pharmacodynamic (PBPK/PD) modeling is used to integrate the findings of rodent neurotoxicity and cancer into estimates of risks from human AA exposure through the diet. The goal of these modeling techniques is to reduce the uncertainty inherent in extrapolating toxicological findings across species and dose by comparing common exposure biomarkers. PBPK/PD modeling estimated population-based lifetime excess cancer risks from average AA consumption in the diet in the range of 1-4 x 10 (-4); however, modeling did not support a link between dietary AA exposure and human neurotoxicity because marginal exposure ratios were 50-300 lower than in rodents. In addition, dietary exposure modeling suggests that because AA is found in so many common foods, even big changes in concentration for single foods or groups of foods would probably have a small impact on overall population-based intake and risk. These results suggest that a more holistic analysis of dietary cancer risks may be appropriate, by which potential risks from AA should be considered in conjunction with other risks and benefits from foods.


Pharmacogenomics | 2007

Key aspects of analyzing microarray gene-expression data

James J. Chen

One major challenge with the use of microarray technology is the analysis of massive amounts of gene-expression data for various applications. This review addresses the key aspects of the microarray gene-expression data analysis for the two most common objectives: class comparison and class prediction. Class comparison mainly aims to select which genes are differentially expressed across experimental conditions. Gene selection is separated into two steps: gene ranking and assigning a significance level. Class prediction uses expression profiling analysis to develop a prediction model for patient selection, diagnostic prediction or prognostic classification. Development of a prediction model involves two components: model building and performance assessment. It also describes two additional data analysis methods: gene-class testing and multiple ordering criteria.

Collaboration


Dive into the James J. Chen's collaboration.

Top Co-Authors

Avatar

David W. Gaylor

National Center for Toxicological Research

View shared research outputs
Top Co-Authors

Avatar

Ralph L. Kodell

University of Arkansas for Medical Sciences

View shared research outputs
Top Co-Authors

Avatar

Wen Zou

Food and Drug Administration

View shared research outputs
Top Co-Authors

Avatar

Hung-Chia Chen

Food and Drug Administration

View shared research outputs
Top Co-Authors

Avatar

Weizhong Zhao

Food and Drug Administration

View shared research outputs
Top Co-Authors

Avatar

Roger Perkins

Food and Drug Administration

View shared research outputs
Top Co-Authors

Avatar

Suzanne M. Morris

National Center for Toxicological Research

View shared research outputs
Top Co-Authors

Avatar

Wei-Jiun Lin

National Center for Toxicological Research

View shared research outputs
Top Co-Authors

Avatar

Weida Tong

Food and Drug Administration

View shared research outputs
Top Co-Authors

Avatar

Hong Fang

Food and Drug Administration

View shared research outputs
Researchain Logo
Decentralizing Knowledge