Andy Liaw | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andy Liaw is active.

Explore More

Publication

Featured researches published by Andy Liaw.

Ecosystems | 2006

Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction

Anantha M. Prasad; Louis R. Iverson; Andy Liaw

The task of modeling the distribution of a large number of tree species under future climate scenarios presents unique challenges. First, the model must be robust enough to handle climate data outside the current range without producing unacceptable instability in the output. In addition, the technique should have automatic search mechanisms built in to select the most appropriate values for input model parameters for each species so that minimal effort is required when these parameters are fine-tuned for individual tree species. We evaluated four statistical models—Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS)—for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model. To test, we applied these techniques to four tree species common in the eastern United States: loblolly pine (Pinus taeda), sugar maple (Acer saccharum), American beech (Fagus grandifolia), and white oak (Quercus alba). When the four techniques were assessed with Kappa and fuzzy Kappa statistics, RF and BT were superior in reproducing current importance value (a measure of basal area in addition to abundance) distributions for the four tree species, as derived from approximately 100,000 USDA Forest Service’s Forest Inventory and Analysis plots. Future estimates of suitable habitat after climate change were visually more reasonable with BT and RF, with slightly better performance by RF as assessed by Kappa statistics, correlation estimates, and spatial distribution of importance values. Although RTA did not perform as well as BT and RF, it provided interpretive models for species whose distributions were captured well by our current set of predictors. MARS was adequate for predicting current distributions but unacceptable for future climate. We consider RTA, BT, and RF modeling approaches, especially when used together to take advantage of their individual strengths, to be robust for predictive mapping and recommend their inclusion in the ecological toolbox.

Journal of Chemical Information and Computer Sciences | 2003

Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling

Vladimir Svetnik; Andy Liaw; Christopher Tong; J. Christopher Culberson; and Robert P. Sheridan; Bradley P. Feuston

A new classification and regression tool, Random Forest, is introduced and investigated for predicting a compounds quantitative or categorical biological activity based on a quantitative description of the compounds molecular structure. Random Forest is an ensemble of unpruned classification or regression trees created by using bootstrap samples of the training data and random feature selection in tree induction. Prediction is made by aggregating (majority vote or averaging) the predictions of the ensemble. We built predictive models for six cheminformatics data sets. Our analysis demonstrates that Random Forest is a powerful tool capable of delivering performance that is among the most accurate methods to date. We also present three additional features of Random Forest: built-in performance assessment, a measure of relative importance of descriptors, and a measure of compound similarity that is weighted by the relative importance of descriptors. It is the combination of relatively high prediction accuracy and its collection of desired features that makes Random Forest uniquely suited for modeling in cheminformatics.

Journal of Biomolecular Screening | 2003

Improved Statistical Methods for Hit Selection in High-Throughput Screening

Christine Brideau; Bert Gunter; Bill Pikounis; Andy Liaw

High-throughput screening (HTS) plays a central role in modern drug discovery, allowing the rapid screening of large compound collections against a variety of putative drug targets. HTS is an industrial-scale process, relying on sophisticated auto mation, control, and state-of-the art detection technologies to organize, test, and measure hundreds of thousands to millions of compounds in nano-to microliter volumes. Despite this high technology, hit selection for HTS is still typically done using simple data analysis and basic statistical methods. The authors discuss in this article some shortcomings of these methods and present alternatives based on modern methods of statistical data analysis. Most important, they describe and show numerous real examples from the biologist-friendly Stat Server® HTS application (SHS), a custom-developed software tool built on the commercially available S-PLUS® and StatServer® statistical analysis and server software. This system remotely processes HTS data using powerful and sophisticated statistical methodology but insulates users from the technical details by outputting results in a variety of readily interpretable graphs and tables.

Journal of Chemical Information and Modeling | 2015

Deep Neural Nets as a Method for Quantitative Structure–Activity Relationships

Junshui Ma; Robert P. Sheridan; Andy Liaw; George E. Dahl; Vladimir Svetnik

Neural networks were widely used for quantitative structure-activity relationships (QSAR) in the 1990s. Because of various practical issues (e.g., slow on large problems, difficult to train, prone to overfitting, etc.), they were superseded by more robust methods like support vector machine (SVM) and random forest (RF), which arose in the early 2000s. The last 10 years has witnessed a revival of neural networks in the machine learning community thanks to new methods for preventing overfitting, more efficient training algorithms, and advancements in computer hardware. In particular, deep neural nets (DNNs), i.e. neural nets with more than one hidden layer, have found great successes in many applications, such as computer vision and natural language processing. Here we show that DNNs can routinely make better prospective predictions than RF on a set of large diverse QSAR data sets that are taken from Mercks drug discovery effort. The number of adjustable parameters needed for DNNs is fairly large, but our results show that it is not necessary to optimize them for individual data sets, and a single set of recommended parameters can achieve better performance than RF for most of the data sets we studied. The usefulness of the parameters is demonstrated on additional data sets not used in the calibration. Although training DNNs is still computationally intensive, using graphical processing units (GPUs) can make this issue manageable.

multiple classifier systems | 2004

Application of Breiman’s Random Forest to Modeling Structure-Activity Relationships of Pharmaceutical Molecules

Vladimir Svetnik; Andy Liaw; Christopher Tong; Ting Wang

Leo Breiman’s Random Forest ensemble learning procedure is applied to the problem of Quantitative Structure-Activity Relationship (QSAR) modeling for pharmaceutical molecules. This entails using a quantitative description of a compound’s molecular structure to predict that compound’s biological activity as measured in an in vitro assay. Without any parameter tuning, the performance of Random Forest with default settings on six publicly available data sets is already as good or better than that of three other prominent QSAR methods: Decision Tree, Partial Least Squares, and Support Vector Machine. In addition to reliable prediction accuracy, Random Forest provides variable importance measures which can be used in a variable reduction wrapper algorithm. Comparisons of various such wrappers and between Random Forest and Bagging are presented.

Journal of Chemical Information and Modeling | 2005

Boosting: an ensemble learning tool for compound classification and QSAR modeling.

Vladimir Svetnik; Ting Wang; Christopher Tong; Andy Liaw; Robert P. Sheridan; Qinghua Song

A classification and regression tool, J. H. Friedmans Stochastic Gradient Boosting (SGB), is applied to predicting a compounds quantitative or categorical biological activity based on a quantitative description of the compounds molecular structure. Stochastic Gradient Boosting is a procedure for building a sequence of models, for instance regression trees (as in this paper), whose outputs are combined to form a predicted quantity, either an estimate of the biological activity, or a class label to which a molecule belongs. In particular, the SGB procedure builds a model in a stage-wise manner by fitting each tree to the gradient of a loss function: e.g., squared error for regression and binomial log-likelihood for classification. The values of the gradient are computed for each sample in the training set, but only a random sample of these gradients is used at each stage. (Friedman showed that the well-known boosting algorithm, AdaBoost of Freund and Schapire, could be considered as a particular case of SGB.) The SGB method is used to analyze 10 cheminformatics data sets, most of which are publicly available. The results show that SGBs performance is comparable to that of Random Forest, another ensemble learning method, and are generally competitive with or superior to those of other QSAR methods. The use of SGBs variable importance with partial dependence plots for model interpretation is also illustrated.

Proceedings of the National Academy of Sciences of the United States of America | 2010

Quantitative analysis of intact apolipoproteins in human HDL by top-down differential mass spectrometry

Matthew T. Mazur; Daniel S. Spellman; Andy Liaw; Nathan A. Yates; Ronald C. Hendrickson

Top-down mass spectrometry holds tremendous potential for the characterization and quantification of intact proteins, including individual protein isoforms and specific posttranslationally modified forms. This technique does not require antibody reagents and thus offers a rapid path for assay development with increased specificity based on the amino acid sequence. Top-down MS is efficient whereby intact protein mass measurement, purification by mass separation, dissociation, and measurement of product ions with ppm mass accuracy occurs on the seconds to minutes time scale. Moreover, as the analysis is based on the accurate measurement of an intact protein, top-down mass spectrometry opens a research paradigm to perform quantitative analysis of “unknown” proteins that differ in accurate mass. As a proof of concept, we have applied differential mass spectrometry (dMS) to the top-down analysis of apolipoproteins isolated from human HDL3. The protein species at 9415.45 Da demonstrates an average fold change of 4.7 (p-value 0.017) and was identified as an O-glycosylated form of apolipoprotein C-III [NANA-(2 → 3)-Gal-β(1 → 3)-GalNAc, +656.2037 Da], a protein associated with coronary artery disease. This work demonstrates the utility of top-down dMS for quantitative analysis of intact protein mixtures and holds potential for facilitating a better understanding of HDL biology and complex biological systems at the protein level.

Journal of Biomolecular Screening | 2003

Statistical and Graphical Methods for Quality Control Determination of High-Throughput Screening Data

Bert Gunter; Christine Brideau; Bill Pikounis; Andy Liaw

High-throughput screening (HTS) is used in modern drug discovery to screen hundreds of thousands to millions of compounds on selected protein targets. It is an industrial-scale process relying on sophisticated automation and state-of-the-art detection technologies. Quality control (QC) is an integral part of the process and is used to ensure good quality data and mini mize assay variability while maintaining assay sensitivity. The authors describe new QC methods and show numerous real examples from their biologist-friendly Stat Server® HTS application, a custom-developed software tool built from the commercially available S-PLUS® and Stat Server® statistical analysis and server software. This system remotely processes HTS data using powerful and sophisticated statistical methodology but insulates users from the technical details by outputting results in a variety of readily interpretable graphs and tables. It allows users to visualize HTS data and examine assay performance during the HTS campaign to quickly react to or avoid quality problems.

Journal of Proteome Research | 2010

Application of an End-to-End Biomarker Discovery Platform to Identify Target Engagement Markers in Cerebrospinal Fluid by High Resolution Differential Mass Spectrometry

Cloud P. Paweletz; Matthew C. Wiener; Andrey Bondarenko; Nathan A. Yates; Qinghua Song; Andy Liaw; Anita Y. H. Lee; Brandon Hunt; Ernst S. Henle; Fanyu Meng; Holly Sleph; Marie A. Holahan; Sethu Sankaranarayanan; Adam J. Simon; Robert E. Settlage; Jeffrey R. Sachs; Mark S. Shearman; Alan B. Sachs; Jacquelynn J. Cook; Ronald C. Hendrickson

The rapid identification of protein biomarkers in biofluids is important to drug discovery and development. Here, we describe a general proteomic approach for the discovery and identification of proteins that exhibit a statistically significant difference in abundance in cerebrospinal fluid (CSF) before and after pharmacological intervention. This approach, differential mass spectrometry (dMS), is based on the analysis of full scan mass spectrometry data. The dMS workflow does not require complex mixing and pooling strategies, or isotope labeling techniques. Accordingly, clinical samples can be analyzed individually, allowing the use of longitudinal designs and within-subject data analysis in which each subject acts as its own control. As a proof of concept, we performed multifactorial dMS analyses on CSF samples drawn at 6 time points from n = 6 cisterna magna ported (CMP) rhesus monkeys treated with 2 potent gamma secretase inhibitors (GSI) or comparable vehicle in a 3-way crossover study that included a total of 108 individual CSF samples. Using analysis of variance and statistical filtering on the aligned and normalized LC-MS data sets, we detected 26 features that were significantly altered in CSF by drug treatment. Of those 26 features, which belong to 10 distinct isotopic distributions, 20 were identified by MS/MS as 7 peptides from CD99, a cell surface protein. Six features from the remaining 3 isotopic distributions were not identified. A subsequent analysis showed that the relative abundance of these 26 features showed the same temporal profile as the ELISA measured levels of CSF A beta 42 peptide, a known pharmacodynamic marker for gamma-secretase inhibition. These data demonstrate that dMS is a promising approach for the discovery, quantification, and identification of candidate target engagement biomarkers in CSF.

Drug Metabolism and Disposition | 2015

Evaluation of Cynomolgus Monkeys for the Identification of Endogenous Biomarkers for Hepatic Transporter Inhibition and as a Translatable Model to Predict Pharmacokinetic Interactions with Statins in Humans

Xiaoyan Chu; Shian-Jiun Shih; Rachel Shaw; Hannes Hentze; Grace Chan; Karen Owens; Shubing Wang; Xiaoxin Cai; Deborah J. Newton; Jose Castro-Perez; Gino Salituro; Jairam Palamanda; Aaron Fernandis; Choon Keow Ng; Andy Liaw; Mary J. Savage; Raymond Evers

Inhibition of hepatic transporters such as organic anion transporting polypeptides (OATPs) 1B can cause drug-drug interactions (DDIs). Determining the impact of perpetrator drugs on the plasma exposure of endogenous substrates for OATP1B could be valuable to assess the risk for DDIs early in drug development. As OATP1B orthologs are well conserved between human and monkey, we assessed in cynomolgus monkeys the endogenous OATP1B substrates that are potentially suitable to assess DDI risk in humans. The effect of rifampin (RIF), a potent inhibitor for OATP1B, on plasma exposure of endogenous substrates of hepatic transporters was measured. From the 18 biomarkers tested, RIF (18 mg/kg, oral) caused significant elevation of plasma unconjugated and conjugated bilirubin, which may be attributed to inhibition of cOATP1B1 and cOATP1B3 based on in vitro to in vivo extrapolation analysis. To further evaluate whether cynomolgus monkeys are a suitable translational model to study OATP1B-mediated DDIs, we determined the inhibitory effect of RIF on in vitro transport and pharmacokinetics of rosuvastatin (RSV) and atorvastatin (ATV). RIF strongly inhibited the uptake of RSV and ATV by cOATP1B1 and cOATP1B3 in vitro. In agreement with clinical observations, RIF (18 mg/kg, oral) significantly decreased plasma clearance and increased the area under the plasma concentration curve (AUC) of intravenously administered RSV by 2.8- and 2.7-fold, and increased the AUC and maximum plasma concentration of orally administered RSV by 6- and 10.3-fold, respectively. In contrast to clinical findings, RIF did not significantly increase plasma exposure of either intravenous or orally administered ATV, indicating species differences in the rate-limiting elimination pathways.

Explore More