Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Connor W. Coley is active.

Publication


Featured researches published by Connor W. Coley.


ACS central science | 2017

Prediction of Organic Reaction Outcomes Using Machine Learning

Connor W. Coley; Regina Barzilay; Tommi S. Jaakkola; William H. Green; Klavs F. Jensen

Computer assistance in synthesis design has existed for over 40 years, yet retrosynthesis planning software has struggled to achieve widespread adoption. One critical challenge in developing high-quality pathway suggestions is that proposed reaction steps often fail when attempted in the laboratory, despite initially seeming viable. The true measure of success for any synthesis program is whether the predicted outcome matches what is observed experimentally. We report a model framework for anticipating reaction outcomes that combines the traditional use of reaction templates with the flexibility in pattern recognition afforded by neural networks. Using 15 000 experimental reaction records from granted United States patents, a model is trained to select the major (recorded) product by ranking a self-generated list of candidates where one candidate is known to be the major product. Candidate reactions are represented using a unique edit-based representation that emphasizes the fundamental transformation from reactants to products, rather than the constituent molecules’ overall structures. In a 5-fold cross-validation, the trained model assigns the major product rank 1 in 71.8% of cases, rank ≤3 in 86.7% of cases, and rank ≤5 in 90.8% of cases.


Journal of Chemical Information and Modeling | 2017

Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction

Connor W. Coley; Regina Barzilay; William H. Green; Tommi S. Jaakkola; Klavs F. Jensen

The task of learning an expressive molecular representation is central to developing quantitative structure-activity and property relationships. Traditional approaches rely on group additivity rules, empirical measurements or parameters, or generation of thousands of descriptors. In this paper, we employ a convolutional neural network for this embedding task by treating molecules as undirected graphs with attributed nodes and edges. Simple atom and bond attributes are used to construct atom-specific feature vectors that take into account the local chemical environment using different neighborhood radii. By working directly with the full molecular graph, there is a greater opportunity for models to identify important features relevant to a prediction task. Unlike other graph-based approaches, our atom featurization preserves molecule-level spatial information that significantly enhances model performance. Our models learn to identify important features of atom clusters for the prediction of aqueous solubility, octanol solubility, melting point, and toxicity. Extensions and limitations of this strategy are discussed.


ACS central science | 2017

Computer-Assisted Retrosynthesis Based on Molecular Similarity

Connor W. Coley; Luke Rogers; William H. Green; Klavs F. Jensen

We demonstrate molecular similarity to be a surprisingly effective metric for proposing and ranking one-step retrosynthetic disconnections based on analogy to precedent reactions. The developed approach mimics the retrosynthetic strategy defined implicitly by a corpus of known reactions without the need to encode any chemical knowledge. Using 40 000 reactions from the patent literature as a knowledge base, the recorded reactants are among the top 10 proposed precursors in 74.1% of 5000 test reactions, providing strong quantitative support for our methodology. Extension of the one-step strategy to multistep pathway planning is demonstrated and discussed for two exemplary drug products.


Angewandte Chemie | 2017

Material-Efficient Microfluidic Platform for Exploratory Studies of Visible-Light Photoredox Catalysis

Connor W. Coley; Milad Abolhasani; Hongkun Lin; Klavs F. Jensen

We present an automated microfluidic platform for in-flow studies of visible-light photoredox catalysis in liquid or gas-liquid reactions at the 15 μL scale. An oscillatory flow strategy enables a flexible residence time while preserving the mixing and heat transfer advantages of flow systems. The adjustable photon flux made possible with the platform is characterized using actinometry. Case studies of oxidative hydroxylation of phenylboronic acids and dimerization of thiophenol demonstrate the capabilities and advantages of the system. Reaction conditions identified through droplet screening translate directly to continuous synthesis with minor platform modifications.


Accounts of Chemical Research | 2018

Machine Learning in Computer-Aided Synthesis Planning

Connor W. Coley; William H. Green; Klavs F. Jensen

Computer-aided synthesis planning (CASP) is focused on the goal of accelerating the process by which chemists decide how to synthesize small molecule compounds. The ideal CASP program would take a molecular structure as input and output a sorted list of detailed reaction schemes that each connect that target to purchasable starting materials via a series of chemically feasible reaction steps. Early work in this field relied on expert-crafted reaction rules and heuristics to describe possible retrosynthetic disconnections and selectivity rules but suffered from incompleteness, infeasible suggestions, and human bias. With the relatively recent availability of large reaction corpora (such as the United States Patent and Trademark Office (USPTO), Reaxys, and SciFinder databases), consisting of millions of tabulated reaction examples, it is now possible to construct and validate purely data-driven approaches to synthesis planning. As a result, synthesis planning has been opened to machine learning techniques, and the field is advancing rapidly. In this Account, we focus on two critical aspects of CASP and recent machine learning approaches to both challenges. First, we discuss the problem of retrosynthetic planning, which requires a recommender system to propose synthetic disconnections starting from a target molecule. We describe how the search strategy, necessary to overcome the exponential growth of the search space with increasing number of reaction steps, can be assisted through a learned synthetic complexity metric. We also describe how the recursive expansion can be performed by a straightforward nearest neighbor model that makes clever use of reaction data to generate high quality retrosynthetic disconnections. Second, we discuss the problem of anticipating the products of chemical reactions, which can be used to validate proposed reactions in a computer-generated synthesis plan (i.e., reduce false positives) to increase the likelihood of experimental success. While we introduce this task in the context of reaction validation, its utility extends to the prediction of side products and impurities, among other applications. We describe neural network-based approaches that we and others have developed for this forward prediction task that can be trained on previously published experimental data. Machine learning and artificial intelligence have revolutionized a number of disciplines, not limited to image recognition, dictation, translation, content recommendation, advertising, and autonomous driving. While there is a rich history of using machine learning for structure-activity models in chemistry, it is only now that it is being successfully applied more broadly to organic synthesis and synthesis design. As reported in this Account, machine learning is rapidly transforming CASP, but there are several remaining challenges and opportunities, many pertaining to the availability and standardization of both data and evaluation metrics, which must be addressed by the community at large.


Reaction Chemistry and Engineering | 2018

Optimum catalyst selection over continuous and discrete process variables with a single droplet microfluidic reaction platform

Lorenz M. Baumgartner; Connor W. Coley; Brandon J. Reizman; Kevin W. Gao; Klavs F. Jensen

A mixed-integer nonlinear program (MINLP) algorithm to optimize catalyst turnover number (TON) and product yield by simultaneously modulating discrete variables—catalyst types—and continuous variables—temperature, residence time, and catalyst loading—was implemented and validated. Several simulated case studies, with and without random measurement error, demonstrate the algorithms robustness in finding optimal conditions in the presence of side reactions and other complicating nonlinearities. This algorithm was applied to the real-time optimization of a Suzuki–Miyaura cross-coupling reaction in an automated microfluidic reaction platform comprising a liquid handler, an oscillatory flow reactor, and an online LC/MS. The algorithm, based on a combination of branch and bound and adaptive response surface methods, identified experimental conditions that maximize TON subject to a yield constraint from a pool of eight catalyst candidates in just 60 experiments, considerably fewer than a previous version of the algorithm.


Journal of Chemical Information and Modeling | 2018

SCScore: Synthetic Complexity Learned from a Reaction Corpus

Connor W. Coley; Luke Rogers; William H. Green; Klavs F. Jensen

Several definitions of molecular complexity exist to facilitate prioritization of lead compounds, to identify diversity-inducing and complexifying reactions, and to guide retrosynthetic searches. In this work, we focus on synthetic complexity and reformalize its definition to correlate with the expected number of reaction steps required to produce a target molecule, with implicit knowledge about what compounds are reasonable starting materials. We train a neural network model on 12 million reactions from the Reaxys database to impose a pairwise inequality constraint enforcing the premise of this definition: that on average, the products of published chemical reactions should be more synthetically complex than their corresponding reactants. The learned metric (SCScore) exhibits highly desirable nonlinear behavior, particularly in recognizing increases in synthetic complexity throughout a number of linear synthetic routes.


Angewandte Chemie | 2017

In-Situ Microfluidic Study of Biphasic Nanocrystal Ligand-Exchange Reactions Using an Oscillatory Flow Reactor

Yi Shen; Milad Abolhasani; Yue Chen; Lisi Xie; Lu Yang; Connor W. Coley; Moungi G. Bawendi; Klavs F. Jensen

Oscillatory flow reactors provide a surface energy-driven approach for automatically screening reaction conditions and studying reaction mechanisms of bi-phasic nanocrystal ligand exchange reactions. Sulfide and cysteine ligand exchange reactions with as-synthesized CdSe quantum dots (QDs) are chosen as two model reactions. Different reaction variables including the new-ligand-to-QD ratio, the size of the particles, and the original ligand type are examined systematically. Based on the in-situ obtained UV-Vis absorption spectra during the reaction, we propose two different exchange pathways for the sulfide exchange reaction.


Langmuir | 2018

Ligand-Mediated Nanocrystal Growth

Stefano Lazzari; Pius M. Theiler; Yi Shen; Connor W. Coley; Andreas Stemmer; Klavs F. Jensen

A microfluidic platform combined with a deterministic model accounting for surface ligands reveals precious insights into the nanocrystal formation process. The comparison of on-line kinetic information with model predictions enables the derivation of temperature-dependent kinetic parameters for the CdSe model system. This fully generalizable approach represents a step forward toward a quantitative prediction of the nanocrystal size distribution, enabling the control and optimization of process performance and material properties.


Journal of Visualized Experiments | 2018

A Modular Microfluidic Technology for Systematic Studies of Colloidal Semiconductor Nanocrystals

Robert W. Epps; Kobi C. Felton; Connor W. Coley; Milad Abolhasani

Colloidal semiconductor nanocrystals, known as quantum dots (QDs), are a rapidly growing class of materials in commercial electronics, such as light emitting diodes (LEDs) and photovoltaics (PVs). Among this material group, inorganic/organic perovskites have demonstrated significant improvement and potential towards high-efficiency, low-cost PV fabrication due to their high charge carrier mobilities and lifetimes. Despite the opportunities for perovskite QDs in large-scale PV and LED applications, the lack of fundamental and comprehensive understanding of their growth pathways has inhibited their adaptation within continuous nanomanufacturing strategies. Traditional flask-based screening approaches are generally expensive, labor-intensive, and imprecise for effectively characterizing the broad parameter space and synthesis variety relevant to colloidal QD reactions. In this work, a fully autonomous microfluidic platform is developed to systematically study the large parameter space associated with the colloidal synthesis of nanocrystals in a continuous flow format. Through the application of a novel translating three-port flow cell and modular reactor extension units, the system may rapidly collect fluorescence and absorption spectra across reactor lengths ranging 3 - 196 cm. The adjustable reactor length not only decouples the residence time from the velocity-dependent mass transfer, it also substantially improves the sampling rates and chemical consumption due to the characterization of 40 unique spectra within a single equilibrated system. Sample rates may reach up to 30,000 unique spectra per day, and the conditions cover 4 orders of magnitude in residence times ranging 100 ms - 17 min. Further applications of this system would substantially improve the rate and precision of the material discovery and screening in future studies. Detailed within this report are the system materials and assembly protocols with a general description of the automated sampling software and offline data processing.

Collaboration


Dive into the Connor W. Coley's collaboration.

Top Co-Authors

Avatar

Klavs F. Jensen

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Milad Abolhasani

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

William H. Green

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Regina Barzilay

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Tommi S. Jaakkola

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Kobi C. Felton

North Carolina State University

View shared research outputs
Top Co-Authors

Avatar

Lisi Xie

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Lorenz M. Baumgartner

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Luke Rogers

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Moungi G. Bawendi

Massachusetts Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge