Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jonathan H. Chen is active.

Publication


Featured researches published by Jonathan H. Chen.


intelligent systems in molecular biology | 2005

Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity

S. Joshua Swamidass; Jonathan H. Chen; Jocelyne Bruand; Peter Phung; Liva Ralaivola; Pierre Baldi

MOTIVATION Small molecules play a fundamental role in organic chemistry and biology. They can be used to probe biological systems and to discover new drugs and other useful compounds. As increasing numbers of large datasets of small molecules become available, it is necessary to develop computational methods that can deal with molecules of variable size and structure and predict their physical, chemical and biological properties. RESULTS Here we develop several new classes of kernels for small molecules using their 1D, 2D and 3D representations. In 1D, we consider string kernels based on SMILES strings. In 2D, we introduce several similarity kernels based on conventional or generalized fingerprints. Generalized fingerprints are derived by counting in different ways subpaths contained in the graph of bonds, using depth-first searches. In 3D, we consider similarity measures between histograms of pairwise distances between atom classes. These kernels can be computed efficiently and are applied to problems of classification and prediction of mutagenicity, toxicity and anti-cancer activity on three publicly available datasets. The results derived using cross-validation methods are state-of-the-art. Tradeoffs between various kernels are briefly discussed. AVAILABILITY Datasets available from http://www.igb.uci.edu/servers/servers.html


Bioinformatics | 2005

ChemDB: a public database of small molecules and related chemoinformatics resources

Jonathan H. Chen; S. Joshua Swamidass; Yimeng Dou; Jocelyne Bruand; Pierre Baldi

MOTIVATION The development of chemoinformatics has been hampered by the lack of large, publicly available, comprehensive repositories of molecules, in particular of small molecules. Small molecules play a fundamental role in organic chemistry and biology. They can be used as combinatorial building blocks for chemical synthesis, as molecular probes in chemical genomics and systems biology, and for the screening and discovery of new drugs and other useful compounds. RESULTS We describe ChemDB, a public database of small molecules available on the Web. ChemDB is built using the digital catalogs of over a hundred vendors and other public sources and is annotated with information derived from these sources as well as from computational methods, such as predicted solubility and three-dimensional structure. It supports multiple molecular formats and is periodically updated, automatically whenever possible. The current version of the database contains approximately 4.1 million commercially available compounds and 8.2 million counting isomers. The database includes a user-friendly graphical interface, chemical reactions capabilities, as well as unique search capabilities. AVAILABILITY Database and datasets are available on http://cdb.ics.uci.edu.


Bioinformatics | 2007

ChemDB update—full-text search and virtual chemical space

Jonathan H. Chen; Erik Linstead; S. Joshua Swamidass; Dennis Ding-Hwa Wang; Pierre Baldi

UNLABELLED ChemDB is a chemical database containing nearly 5M commercially available small molecules, important for use as synthetic building blocks, probes in systems biology and as leads for the discovery of drugs and other useful compounds. The data is publicly available over the web for download and for targeted searches using a variety of powerful methods. The chemical data includes predicted or experimentally determined physicochemical properties, such as 3D structure, melting temperature and solubility. Recent developments include optimization of chemical structure (and substructure) retrieval algorithms, enabling full database searches in less than a second. A text-based search engine allows efficient searching of compounds based on over 65M annotations from over 150 vendors. When searching for chemicals by name, fuzzy text matching capabilities yield productive results even when the correct spelling of a chemical name is unknown, taking advantage of both systematic and common names. Finally, built in reaction models enable searches through virtual chemical space, consisting of hypothetical products readily synthesizable from the building blocks in ChemDB. AVAILABILITY ChemDB and Supplementary Materials are available at http://cdb.ics.uci.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


Journal of Chemical Information and Modeling | 2011

Learning to Predict Chemical Reactions

Matthew A. Kayala; Chloé-Agathe Azencott; Jonathan H. Chen; Pierre Baldi

Being able to predict the course of arbitrary chemical reactions is essential to the theory and applications of organic chemistry. Approaches to the reaction prediction problems can be organized around three poles corresponding to: (1) physical laws; (2) rule-based expert systems; and (3) inductive machine learning. Previous approaches at these poles, respectively, are not high throughput, are not generalizable or scalable, and lack sufficient data and structure to be implemented. We propose a new approach to reaction prediction utilizing elements from each pole. Using a physically inspired conceptualization, we describe single mechanistic reactions as interactions between coarse approximations of molecular orbitals (MOs) and use topological and physicochemical attributes as descriptors. Using an existing rule-based system (Reaction Explorer), we derive a restricted chemistry data set consisting of 1630 full multistep reactions with 2358 distinct starting materials and intermediates, associated with 2989 productive mechanistic steps and 6.14 million unproductive mechanistic steps. And from machine learning, we pose identifying productive mechanistic steps as a statistical ranking, information retrieval problem: given a set of reactants and a description of conditions, learn a ranking model over potential filled-to-unfilled MO interactions such that the top-ranked mechanistic steps yield the major products. The machine learning implementation follows a two-stage approach, in which we first train atom level reactivity filters to prune 94.00% of nonproductive reactions with a 0.01% error rate. Then, we train an ensemble of ranking models on pairs of interacting MOs to learn a relative productivity function over mechanistic steps in a given system. Without the use of explicit transformation patterns, the ensemble perfectly ranks the productive mechanism at the top 89.05% of the time, rising to 99.86% of the time when the top four are considered. Furthermore, the system is generalizable, making reasonable predictions over reactants and conditions which the rule-based expert does not handle. A web interface to the machine learning based mechanistic reaction predictor is accessible through our chemoinformatics portal ( http://cdb.ics.uci.edu) under the Toolkits section.


Journal of Chemical Information and Modeling | 2007

One- to Four-Dimensional Kernels for Virtual Screening and the Prediction of Physical, Chemical, and Biological Properties

Chloé-Agathe Azencott; Alexandre Ksikes; S. Joshua Swamidass; Jonathan H. Chen; Liva Ralaivola; Pierre Baldi

Many chemoinformatics applications, including high-throughput virtual screening, benefit from being able to rapidly predict the physical, chemical, and biological properties of small molecules to screen large repositories and identify suitable candidates. When training sets are available, machine learning methods provide an effective alternative to ab initio methods for these predictions. Here, we leverage rich molecular representations including 1D SMILES strings, 2D graphs of bonds, and 3D coordinates to derive efficient machine learning kernels to address regression problems. We further expand the library of available spectral kernels for small molecules developed for classification problems to include 2.5D surface and 3D kernels using Delaunay tetrahedrization and other techniques from computational geometry, 3D pharmacophore kernels, and 3.5D or 4D kernels capable of taking into account multiple molecular configurations, such as conformers. The kernels are comprehensively tested using cross-validation and redundancy-reduction methods on regression problems using several available data sets to predict boiling points, melting points, aqueous solubility, octanol/water partition coefficients, and biological activity with state-of-the art results. When sufficient training data are available, 2D spectral kernels in general tend to yield the best and most robust results, better than state-of-the art. On data sets containing thousands of molecules, the kernels achieve a squared correlation coefficient of 0.91 for aqueous solubility prediction and 0.94 for octanol/water partition coefficient prediction. Averaging over conformations improves the performance of kernels based on the three-dimensional structure of molecules, especially on challenging data sets. Kernel predictors for aqueous solubility (kSOL), LogP (kLOGP), and melting point (kMELT) are available over the Web through: http://cdb.ics.uci.edu.


Journal of Chemical Information and Modeling | 2009

No Electron Left Behind: A Rule-Based Expert System To Predict Chemical Reactions and Reaction Mechanisms

Jonathan H. Chen; Pierre Baldi

Predicting the course and major products of arbitrary reactions is a fundamental problem in chemistry, one that chemists must address in a variety of tasks ranging from synthesis design to reaction discovery. Described here is an expert system to predict organic chemical reactions based on a knowledge base of over 1500 manually composed reaction transformation rules. Novel rule extensions are introduced to enable robust predictions and describe detailed reaction mechanisms at the level of electron flows in elementary reaction steps, ensuring that all reactions are properly balanced and atom-mapped. The core reaction prediction functionalities of this expert system are illustrated with applications including: (1) prediction of detailed reaction mechanisms; (2) computer-based learning in organic chemistry; (3) retrosynthetic analysis; and (4) combinatorial library design. Select applications are available via http://cdb.ics.uci.edu.


Perfusion | 2007

180 ml and less: cardiopulmonary bypass techniques to minimize hemodilution for neonates and small infants.

Kevin Charette; Yasutaka Hirata; Adam Bograd; Linda Mongero; Jonathan H. Chen; Jan M. Quaegebeur; Ralph S. Mosca

Objective . To determine the efficacy of decreasing cardiopulmonary bypass (CPB) prime volume for neonates and small infants by using low prime oxygenators, small diameter polyvinyl chloride (PVC) tubing and removing the arterial line filter (ALF) in an effort to reduce intraoperative exposure to multiple units of packed red blood cells (PRBC). Methods. Two retrospective database studies comparing neonatal CPB prime volume were undertaken: Study 1 — A CPB circuit consisting of a 1/8 inch arterial line, a 3/16 inch venous line and a low prime oxygenator with 172 ml total circuit prime (n = 74) was compared to a circuit with a 3/16 inch arterial line, a 1/4 inch venous line and a higher prime oxygenator with a 350 ml total circuit prime ( n = 74). Study 2 — The 172 ml circuit (n = 389) was compared to a circuit that included an ALF and had a total circuit prime volume of 218 ml (n = 389). Results. Study 1— of the 74 neonates and small infants whose CPB prime volume was 350 ml, 19 were exposed to two or more intraoperative exogenous PRBC units while only 3 neonates and small infants in the 172 ml prime group (n = 74) received two or more units (p = 0.0002). Study 2 — of the 389 neonates and small infants where an ALF was used (prime volume 218 ml), 54 were exposed to two or more exogenous PRBC units while only 36 of the 389 patients where an ALF was not used (prime volume 172 ml) received two or more units of intraoperative PRBCs (p = 0.0436). Conclusion. Decreasing the neonatal and small infant extracorporeal circuit prime volume by as little as 46 ml resulted in significantly fewer multiple exposures to exogenous PRBC units. Perfusion (2007) 22, 327—331.


IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2006

Functional Census of Mutation Sequence Spaces: The Example of p53 Cancer Rescue Mutants

Samuel A. Danziger; S. Joshua Swamidass; Jue Zeng; Lawrence R. Dearth; Qiang Lu; Jonathan H. Chen; Jianlin Cheng; Vinh P. Hoang; Hiroto Saigo; Ray Luo; Pierre Baldi; Rainer K. Brachmann; Richard H. Lathrop

Many biomedical problems relate to mutant functional properties across a sequence space of interest, e.g., flu, cancer, and HIV. Detailed knowledge of mutant properties and function improves medical treatment and prevention. A functional census of p53 cancer rescue mutants would aid the search for cancer treatments from p53 mutant rescue. We devised a general methodology for conducting a functional census of a mutation sequence space by choosing informative mutants early. The methodology was tested in a double-blind predictive test on the functional rescue property of 71 novel putative p53 cancer rescue mutants iteratively predicted in sets of three (24 iterations). The first double-blind 15-point moving accuracy was 47 percent and the last was 86 percent; r = 0.01 before an epiphanic 16th iteration and r = 0.92 afterward. Useful mutants were chosen early (overall r = 0.80). Code and data are freely available (http://www.igb.uci.edu/research/research.html, corresponding authors: R.H.L. for computation and R.K.B. for biology)


PLOS ONE | 2013

In Vivo Readout of CFTR Function: Ratiometric Measurement of CFTR-Dependent Secretion by Individual, Identifiable Human Sweat Glands

Jeffrey J. Wine; Jessica E. Char; Jonathan H. Chen; Hyung Ju Cho; Colleen Dunn; Eric Frisbee; Nam Soo Joo; Carlos Milla; Sara E. Modlin; Il Ho Park; Ewart A. C. Thomas; Kim V. Tran; Rohan Verma; Marlene H. Wolfe

To assess CFTR function in vivo, we developed a bioassay that monitors and compares CFTR-dependent and CFTR-independent sweat secretion in parallel for multiple (∼50) individual, identified glands in each subject. Sweating was stimulated by intradermally injected agonists and quantified by optically measuring spherical sweat bubbles in an oil-layer that contained dispersed, water soluble dye particles that partitioned into the sweat bubbles, making them highly visible. CFTR-independent secretion (M-sweat) was stimulated with methacholine, which binds to muscarinic receptors and elevates cytosolic calcium. CFTR-dependent secretion (C-sweat) was stimulated with a β-adrenergic cocktail that elevates cytosolic cAMP while blocking muscarinic receptors. A C-sweat/M-sweat ratio was determined on a gland-by-gland basis to compensate for differences unrelated to CFTR function, such as gland size. The average ratio provides an approximately linear readout of CFTR function: the heterozygote ratio is ∼0.5 the control ratio and for CF subjects the ratio is zero. During assay development, we measured C/M ratios in 6 healthy controls, 4 CF heterozygotes, 18 CF subjects and 4 subjects with ‘CFTR-related’ conditions. The assay discriminated all groups clearly. It also revealed consistent differences in the C/M ratio among subjects within groups. We hypothesize that these differences reflect, at least in part, levels of CFTR expression, which are known to vary widely. When C-sweat rates become very low the C/M ratio also tended to decrease; we hypothesize that this nonlinearity reflects ductal fluid absorption. We also discovered that M-sweating potentiates the subsequent C-sweat response. We then used potentiation as a surrogate for drugs that can increase CFTR-dependent secretion. This bioassay provides an additional method for assessing CFTR function in vivo, and is well suited for within-subject tests of systemic, CFTR-directed therapeutics.


Journal of the American Medical Informatics Association | 2016

Predicting inpatient clinical order patterns with probabilistic topic models vs conventional order sets

Jonathan H. Chen; Mary K. Goldstein; Steven M. Asch; Lester W. Mackey; Russ B. Altman

Objective: Build probabilistic topic model representations of hospital admissions processes and compare the ability of such models to predict clinical order patterns as compared to preconstructed order sets. Materials and Methods: The authors evaluated the first 24 hours of structured electronic health record data for > 10 K inpatients. Drawing an analogy between structured items (e.g., clinical orders) to words in a text document, the authors performed latent Dirichlet allocation probabilistic topic modeling. These topic models use initial clinical information to predict clinical orders for a separate validation set of > 4 K patients. The authors evaluated these topic model-based predictions vs existing human-authored order sets by area under the receiver operating characteristic curve, precision, and recall for subsequent clinical orders. Results: Existing order sets predict clinical orders used within 24 hours with area under the receiver operating characteristic curve 0.81, precision 16%, and recall 35%. This can be improved to 0.90, 24%, and 47% (P < 10−20) by using probabilistic topic models to summarize clinical data into up to 32 topics. Many of these latent topics yield natural clinical interpretations (e.g., “critical care,” “pneumonia,” “neurologic evaluation”). Discussion: Existing order sets tend to provide nonspecific, process-oriented aid, with usability limitations impairing more precise, patient-focused support. Algorithmic summarization has the potential to breach this usability barrier by automatically inferring patient context, but with potential tradeoffs in interpretability. Conclusion: Probabilistic topic modeling provides an automated approach to detect thematic trends in patient care and generate decision support content. A potential use case finds related clinical orders for decision support.

Collaboration


Dive into the Jonathan H. Chen's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Pierre Baldi

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

S. Joshua Swamidass

Washington University in St. Louis

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge