[PDF] Materials Graph Transformer predicts the outcomes of inorganic reactions with reliable uncertainties

Abstract

A common bottleneck for materials discovery is synthesis. While recent methodological advances have resulted in major improvements in the ability to predicatively design novel materials, researchers often still rely on trial-and-error approaches for determining synthesis procedures. In this work, we develop a model that predicts the major product of solid-state reactions. The cardinal feature of this approach is the construction of fixed-length, learned representations of reactions. Precursors are represented as nodes on a `reaction graph', and message-passing operations between nodes are used to embody the interactions between precursors in the reaction mixture. Through an ablation study, it is shown that this framework not only outperforms less physically-motivated baseline methods but also more reliably assesses the uncertainty in its predictions.

Full PDF

MMaterials Graph Transformer predicts the outcomes of inorganic reactions withreliable uncertainties

Shreshth A. Malik, Rhys E. A. Goodall, and Alpha A. Lee ∗ Cavendish Laboratory, University of Cambridge, Cambridge CB3 0HE, UK.

A key bottleneck for materials discovery is synthesis. While signiﬁcant advanceshave been made in computational materials design, determining synthesis proceduresis still often done with trial-and-error. In this work, we develop a method thatpredicts the major product of solid-state reactions. The key idea is the constructionof ﬁxed-length, learned representations of reactions. Precursors are represented asnodes on a ‘reaction graph’, and message-passing operations between nodes are usedto embody the interactions between precursors in the reaction mixture. We show thatthis framework not only outperforms less physically-motivated baseline methods butalso more reliably assesses the uncertainty in its predictions. Moreover, our approachestablishes a quantitative metric for inorganic reaction similarity, allowing the userto explain model predictions and retrieve relevant literature sources.

The discovery of new materials and control of theirproperties is fundamental to the advancement of tech-nology. Adjusting the structure and composition ofcrystals allows ﬁne-tuning of properties such as conduc-tivity and band gap [1–3]. Advances in ﬁrst-principlescalculations and increased computational power haverevolutionised the availability of materials information[4, 5]. Simultaneously, advances in computational mate-rials design [6] have enabled high-throughput predictionand screening in silico [7].However, due to the challenges associated with ma-terial synthesis, only a small proportion of candidatesare produced in the laboratory and experimentally val-idated. The most common route to the formationof polycrystalline materials is solid-state synthesis [8],which typically involves calcining a mixture of solid re-actants. Reactions occur through solid-state diﬀusionof ions, thus reactants are often milled and mixed toimprove the reaction kinetics. However, the interplaybetween thermodynamically driven energy minimisationand kinetic factors results in products which are oftendiﬃcult to predict; the ﬁeld lacks the well-understoodreaction mechanisms in organic chemistry. Particularprocedures or reactants result in the formation of crys-tals with unique morphologies and compositions as theyform through metastable states. Thus researchers fre-quently rely on a trial-and-error process of synthesisand evaluation. The development of tools to proposeand evaluate the most promising synthetic pathways is,therefore, one of the biggest challenges facing materialsdiscovery [9, 10].A recently released inorganic synthesis dataset hasnow enabled the possibility of using a data-driven ap-proach to predict the outcomes of inorganic synthesis[11]. To our knowledge, there exists only one studywhich explores this opportunity [12], which addressesthe inverse problem of retrosynthesis – predicting pre-cursors that can react to yield a target product – with anarchitecture that cannot be readily extended to solve the forward problem. However, save for large scale experi-mental validation, the accuracy of retrosynthesis mod-els cannot be quantitatively benchmarked because thereare almost inﬁnitely many ways to synthesize a material,and reactions reported in the literature are not neces-sarily even the best synthetic route.Here, we report an accurate forward reaction predic-tion model with reliable uncertainties. Methodologi-cally, we leverage ideas from graph representation learn-ing to learn the optimal representation of sets of inor-ganic reactants directly from data. Our work signiﬁ-cantly outperforms baselines, and we show that modeluncertainty can be robustly estimated using an ensem-ble.Our work is a building block towards an inorganicretrosynthesis planner that can interface directly withcomputational tools to close the materials design-make-test cycle.

DATA AND MODELSolid-State Syntheses Dataset

We can deﬁne a generalised solid-state synthesis pro-cedure for a target material as a sequence of processes( actions ) performed on a set of starting materials ( pre-cursors ). A recent study has extracted detailed infor-mation for over 19,000 such synthesis procedures fromacademic literature [11]. Each synthesis procedure hasfour relevant ﬁelds for this problem,1.

Target:

The stoichiometric formula of the targetmaterial for the synthesis.2.

Precursors:

Deﬁned as starting materials whichexplicitly share at least one element with the tar-get material, excluding ‘abundant’ materials, i.e.those found in the air. a r X i v : . [ phy s i c s . c o m p - ph ] S e p Processing Actions:

The sequence of synthesisactions performed on the precursors, including therelevant conditions for each action where available.4.

Balanced Chemical Equation:

Balancedchemical equations for the formation of the tar-get compound provide the relevant molar ratios ofprecursors in the reaction mixture.This dataset is to our knowledge the most extensive andhighest-ﬁdelity available for inorganic syntheses. Chem-ical information (precursors, targets and balanced equa-tions) has been extracted with high accuracy (93%).However, there are some notable limitations thatmust be considered when utilising it for modelling.Firstly, some procedures have actions and conditionswith missing or incorrect data. Over 10% of reactionshave no, or only one recorded action. This is primarilybecause standard procedures are often referenced ratherthan explicitly stated. The accuracy of the completeprocedures (where all actions, conditions, and chemicalinformation are fully correct) is 51%. Further, there isno structural information for the products. This limitsphysical analysis of the materials and hence a purelystoichiometric approach must be used in this work.Nevertheless, this dataset provides a wealth of struc-tured information on successful solid-state syntheses.Machine learning (ML) frameworks provide a uniqueway to assimilate empirical patterns from data. One canthus envisage the target material as being an abstractbut learnable function of the precursors and processingprocedure.

Representation of Inorganic Reactions

ML models generally operate on ﬁxed-length inputswhilst typically our understanding of inorganic reactionsis expressed in terms of variable-sized sets of precursorsand sequences of processing actions. Bridging this dis-crepancy is a key challenge that needs to be addressedin the construction of ML algorithms for inorganic re-action prediction.Here this process is broken down into 2 stages: 1) ob-taining ﬁxed-length representations for the precursorsand the processing sequence, and 2) performing set re-gression on this set of precursors to predict the productgiven the processing sequence.

Representation of Precursors

A naïve approach would consider representing materi-als as sparse vectors with components proportional tothe relative amounts of its constituent elements (itsstoichiometry). However, such a representation fails to capture critical correlations between diﬀerent ele-ments. In contrast,

Magpie embeddings oﬀer powerfulgeneral-purpose descriptors for bulk inorganic materi-als in the absence of structural information [13]. Theseare highly engineered, 145-dimensional vector represen-tations which include as many scientiﬁc priors derivedfrom its chemical formula as possible. These featuresfall into 4 groups: stoichiometric properties, elemen-tal properties, electronic structure, and ionic compoundfeatures. We utilise these embeddings as initial precur-sor feature vectors.

Representation of Processing Actions

Sequence representation is a well studied problem inML research [15–17]. Here we utilise single-layer LongShort-Term Memory Units (LSTMs) [18] in an autoen-coder architecture to obtain ﬁxed-length representationsof the processing sequences.

Reaction Graph Model

In solid-state syntheses, precursors are inherently‘mixed’ together to form a product. Here we make useof an architecture for set regression based on the

Roost model introduced in [19]. The model treats this pro-cess through a series of message-passing stages in whichprecursors appear as nodes on a dense, weighted graph.A ﬁxed-length representation for the reaction is derivedfrom this graph which allows for generalisation to arbi-trary numbers of precursors.This learned reaction embedding is then used to pre-dict the elements present in the target material. Thesame embedding is then used to separately predict theproduct stoichiometry given the elements in the prod-uct. If the stoichiometry was directly predicted, an arbi-trary threshold stoichiometry would have to be used toselect which precursor elements are actually present inthe product. The two stage approach used here is there-fore necessary to diﬀerentiate between products withand without trace elements.

Reaction Representation Learning

The crucial ﬁrst step is to produce a ﬁxed-length rep-resentation of the reaction mixture. Figure 1 a showshow precursors are represented as nodes on a dense,weighted graph and action sequences are transformedinto a ﬁxed-length representation using an LSTM en-coder. The initial precursor feature vectors are trans-formed into a trainable embedding through a learnableaﬃne transformation, (cid:126)v i = W (cid:126)v ,i , (1) Figure 1. a. Representation of precursors and processing actions.

Magpie embeddings [13] are used to represent theprecursors as ﬁxed-length vectors. These are used as features for nodes on a dense graph. An alchemy mask, used toconstrain the predictions of the model, is constructed from the set of elements in the precursors. Processing action sequencesare encoded using an LSTM encoder. b. Reaction representation learning. The reaction graph goes through a series ofmessage-passing operations between nodes. The action embedding is used as a global state in all the message-passing steps.The ﬁnal graph representation is then pooled, weighted by attention coeﬃcients, to obtain a ﬁxed-length representationof the reaction. c. Product prediction from reaction representations. The learned reaction embedding is used to predictthe elements present in the major product. The alchemy mask ensures that only elements present in the precursors canbe predicted as present in the product. Separately, the amount of each predicted element is regressed with its fractionalstoichiometry.

Matscholar element embeddings [14] are concatenated with the reaction embedding to query the element inquestion. where W is a learnable weight matrix, and (cid:126)v ,i is theinitial precursor representation.A series of message-passing operations (Figure 1 b )then update the precursor representations (cid:126)v t +1 i = U t ( (cid:126)v ti , (cid:88) j (cid:54) = i (cid:126)v tj , (cid:126)u seq ) , (2)where (cid:126)v ti is the i th precursor feature vector at message-passing layer t , (cid:126)v tj are the other precursor features, (cid:126)u seq is the ﬁxed-length representations of the action se-quences and U t is the update function.At the heart of the update function is the soft-attention mechanism, which allows the model to capturethe importance of the other precursors in the reactionto the precursor in question. Attention is in general amethod for assigning the importance of certain featuresin a machine learning task. A soft-attention mechanismextends this idea by allowing the model to learn theattention coeﬃcients itself [16]. This has been shownto improve graph representation learning [20]. Here,unnormalised attention coeﬃcients are ﬁrst calculatedacross pairs of precursors (cid:15) ij = f (cid:0) (cid:126)v ti ⊕ (cid:126)v tj ⊕ (cid:126)u seq (cid:1) , (3)where f is a single hidden-layer neural network, ⊕ isthe concatenation operation, and (cid:126)u seq is the action se-quence embedding which is used as a ‘global’ state in allupdate steps. This gives procedural context to enablediﬀerentiation between products formed through diﬀer-ent synthesis pathways. (cid:15) ij are then normalised by aweighted softmax function, α ij = w ij exp( (cid:15) ij ) (cid:88) k w ik exp( (cid:15) ik ) , (4)where the weight w ij is the molar amount of precursor j in the balanced chemical equation. This gives themodel context of the quantities of each precursor in thereaction mixture. The node feature is then updatedwith the pairwise interactions between nodes, weightedby their attention coeﬃcients (cid:126)v t +1 i = (cid:126)v ti + 1 K K (cid:88) k =1 (cid:88) j (cid:54) = i α kij g k (cid:0) (cid:126)v ti ⊕ (cid:126)v tj ⊕ (cid:126)u seq (cid:1) . (5)Here g is again a single hidden-layer neural network,and we average over K attention heads, which has beenshown to stabilise performance [20].A similar attention-mediated pooling operation isthen used to create a ﬁxed-length, learned representa-tion of the reaction (cid:126)r from the ﬁnal graph. The modellearns how much attention to pay to each precursorgiven its ﬁnal representation and relative quantity. Product Prediction

The learned reaction embedding is then passedthrough a feed-forward neural network trained to per-form multi-label element classiﬁcation for elements inthe target product of the reaction (Figure 1 c ). In ma-terials synthesis, the products cannot contain elementsthat are not present in the precursors. We add thisinductive bias in the form of a binary ‘alchemy mask’ on the output of the network. Oxygen is assumed tobe abundant and is additionally included in all masksirrespective of its presence in the precursors.Once the elements in the product have been predicted,the ﬁnal task is to predict the relative amounts of eachelement in the product. Here, we use Matscholar ele-ment embeddings, (cid:126)e i , to represent and diﬀerentiate be-tween constituent elements [14]. These are concatenatedwith the reaction representation, (cid:126)r , to provide a con-textualised query and fed through another feed-forwardneural network, h , to predict the relative amount of el-ement i , a i = h ( (cid:126)e i ⊕ (cid:126)r ) . (6)These are then normalised using a softmax function toarrive at a fractional stoichiometry s i = 1 N N (cid:88) k =1 softmax i ( a ki ) , (7)where s i is the normalised stoichiometry of element i in the product, and we average over N predictions toincrease stability and accuracy. In addition, using anensemble of N models also allow us to estimate the epis-temic uncertainty by taking the variance in predictionsacross the models in the ensemble. We refer to thisensembled model as the reaction graph model. RESULTS AND DISCUSSIONProduct Prediction

The reaction graph model was tested on solid-statereactions with up to ten precursors. Product elemen-tal composition was predicted with a subset accuracy of0.940. This can be compared to a null baseline accu-racy of 0.338 which would be achieved if all elements inthe precursors were assumed to be present in the prod-uct. There is evidently a bias towards positive elementlabelling; 83.9% of elements in the precursors are alsopresent in the product. Thus another metric of inter-est is the F1 score. The average F1 score achieved was0.9910, where the score for each element is weighted bythe number of true occurrences to account for the ele-mental imbalance.Results for stoichiometry prediction can be seen inFigure 2. The accuracy of model predictions is wellaligned with the epistemic uncertainty – the most cer-tain predictions lie closest to the equivalence line. TheL1 and L2 distances between composition vectors canprovide metrics for measuring material similarity. Here,the fractional stoichiometry vectors of predicted prod-ucts were accurate to a mean L1 distance of 0.1259, andL2 distance of 0.0729 to their true stoichiometries.

Figure 2. Parity plot for product stoichiometry prediction.The fractional amount of each element in the target prod-uct is plotted against its predicted value. The data pointsare coloured according to their relative conﬁdence, estimatedfrom the variation in predictions from the ensemble. In gen-eral, more conﬁdent predictions appear closer to the equiv-alence line.

Explaining Predictions via Reaction Similarity

The predictions of the model can be explained byconsidering the reaction representation – similar fea-ture vectors will result in similar predictions. We cantherefore understand why certain predictions are madethrough assessing the cosine similarity between the re-action embedding vector and those in the training data.We illustrate this via examples of such reactions thatare not in the training set, and show that similar reac-tions in the training set provide explanations for thesepredictions.Firstly we investigate the synthesis of a sodium-ionbattery cathode material NVP [21],3 NaF + 2 VPO → Na V (PO ) F . (8)The product composition was predicted accuratelywithin an L2 error of 0.0067. To analyse why the pre-diction was accurate, we assess the similarity of thelearned reaction embedding derived from the precursorsand processing action sequence to other reactions in thetraining set: Nearest Neighbors to (8) Similarity

LiF + VPO → LiVPO F 98.5%LiF + FePO → LiFePO F 97.3%

Here we see that the embedding is very similar toother vanadium ﬂuorophosphate syntheses. Despite dif-fering cations appearing in the precursors, the modelhas learnt the general form of such reactions and theexpected ratios of elements in the product. The modeltriangulates a prediction from similar observed reactionsand the elemental context.As a second example, we explore a diﬀerent subspaceof materials. Perovskite manganites with the generalformula Ln A x MnO (where Ln is a trivalent rareearth and A is a divalent alkaline earth) have attractedinterest since the discovery of large negative magneto-resistance in such structures [22]. When faced with thenovel reaction [23],0.05 MnO + 0.15 CaCO + 0.3 La O +0.01 BaCO → La Ca · Ba MnO +0.16 CO + 0.505 O , (9)the product was predicted within an L2 error of 0.0019.The reaction embedding was observed to be similar toother manganite syntheses in the training set: Nearest Neighbors to (9) Similarity O + 0.47 CaCO + 0.03 BaCO + MnO → Pr Ca Ba MnO + 0.5 CO + 0.208 O + 0.15 CaCO + 0.5 Mn O + 0.05 BaCO + 0.375 La O + 0.063 O → La Ca Sr Ba MnO + 0.25 CO The model predicted the correct ratios of oxygen andmanganese to the other elements in the product, learn-ing from ratios in similar reactions. In the dataset how-ever, the product composition was incorrectly extractedas a mixture of La Ca and Ba MnO in the ratio1:0.05 due to the abnormal dot notation in the calciumstoichiometry. By comparison to similar reactions, onecan thus also validate reactions extracted from litera-ture.The reaction embeddings also serve as a means toanalyse why certain predictions are incorrect. As manyof these failure modes can be identiﬁed via estimationof the epistemic uncertainty, we are primarily interestedin explaining the worst predictions – those which havelarge error but low uncertainty. Reaction (10) is anexample of one such case.1.5 Na CO + 0.333 Co O → Na CoO + 1.5 CO + 0.417 O (10) Target

Na: 0.5000 Co: 0.1667 O: 0.3333

Prediction

Na: 0.1985 Co: 0.2692 O: 0.5323

In the original paper [24], Na CoO is the nominalproduct but the paper actually reports the formationof diﬀerent phases of Na x CoO in their experiment. Assuch, the model prediction is not unreasonable. Themodel reached that prediction by inferring from similarreactions observed in the training data (but not in ref[24]): Nearest Neighbors to (10) Similarity CO + 0.333 Co O + 0.146 O → Na CoO + 0.375 CO CO + 0.333 Co O + 0.083 O → NaCoO + 0.5 CO Benchmarking and Ablation Study

To place the accuracy of our model into context, webenchmark our reaction graph model against a baselineapproach and conduct an ablation study on its key fea-tures.The baseline model we use takes the

Magpie precur-sor embeddings and concatenates them to be used asthe input to a neural network with the same architec-ture as the output network of the reaction graph model.As with the reaction graph model, we make use of anensemble of N individual models for this baseline.While the reaction graph model can handle variablenumbers of precursors, this Magpie baseline approachis inherently restricted to a certain input size. Thuswe limit our model to reactions with up to three precur-sors only, and zero-pad the input in the

Magpie baselinemodel where there are less than three precursors to en-able comparison.The ablation study compares results across reactiongraph models with/without the processing actions andthe alchemy mask to discern which parts of the modeldesign are signiﬁcant. The results from the study areshown in Figure 3 and a summary of the key accuracymetrics is shown in Table I. A null baseline result isalso given for comparison where we assume all elementsin the precursors are present in the product, and arepresent in their average relative amounts derived fromthe training data.

Reaction Representation Learning

For product element prediction, the reaction graphmodels are seen to perform signiﬁcantly better than us-ing

Magpie features alone. For stoichiometry prediction,Figure 3 b shows the proportion of predictions that arewithin given L1 error tolerances. A clear improvementis seen through incorporating the graph representationlearning framework.Figure 3 a shows plots of actual against predicted frac-tional stoichiometry. This ﬁgure shows how predictionsare distributed about the equivalence line, and theirrelative conﬁdence as measured through ensemble vari-ance. Similar to the null baseline, the Magpie baselinemodel predictions show a lack of parity about the equiv-alence line.In contrast, the learned reaction graph embeddingsshow a signiﬁcant improvement, both in absolute termsand in better prediction of the model uncertainty. In

Figure 3. a. Ablation study parity plots for product stoichiometry prediction. An improvement in both correlation anduncertainty estimation is seen in incorporating the reaction graph framework. b. Threshold plot for product stoichiometryprediction. The cumulative proportion of correctly predicted stoichiometries is plotted against an L1 error threshold. A clearimprovement is seen through using the learned reaction graph embedding compared to ﬁxed

Magpie features. No apparentimprovement is observed through the inclusion of the action sequences. c. Conﬁdence-error plots for stoichiometry predictionshowing the mean L1 error in product stoichiometry prediction as the most uncertain predictions are removed. Uncertaintyestimation is greatly improved through using the reaction graph embedding – more erroneous predictions are in general moreuncertain. general, the most certain predictions are closest to theequivalence line and there is more parity. This indi-cates the model has assimilated some chemical knowl-edge. The improvement in uncertainty prediction canbe seen more clearly in the conﬁdence-error plot in Fig-ure 3 c . Including the learned reaction graph embeddingshows a clear decrease in the mean error of predictedproducts as the most uncertain predictions are removed.The almost ﬂat line observed when using ﬁxed Magpie embeddings indicates that there is little calibration be-tween error and uncertainty for the baseline model.

Alchemy Mask

The results in the lower half of Table I are achievedthrough the inclusion of the alchemy mask , which ef-fectively prevents the model from making predictionswhich are chemically unfeasible. This inductive biasprovides the most notable increase in accuracy acrossthe board for element prediction. This highlights the utility of incorporating physical principles into ML mod-els.

Processing Action Sequences

The synthesis actions are crucial in determining themajor product of an inorganic reaction; diﬀerent pro-cedures on the same precursors can lead to diﬀeringproducts due to the interplay between thermodynam-ics and kinetics. However, Table I shows the additionof the action sequences in the reaction embedding has adetrimental eﬀect – if any – on model performance.The level of detail of the action data used in the modelcan provide an explanation. Diﬀerent product composi-tions and morphologies can be formed from even slightchanges in temperature or pressure. Thus one cannotexpect that using the generic, coarse terms to representaction sequences can lead to signiﬁcant improvementsin performance. The inaccuracies of the processing se-quences in the dataset only serve to compound this ef-

Table I. Ablation performance table for reactions with up to three precursors. The subset accuracy and weighted F1 score isshown for the prediction of product elements. The null baseline result assumes all elements in the precursors are present inthe product, and assumes that elements are present in their average amounts from the training data. Providing inductive biasto the model in the form of the alchemy mask improves accuracies across the board. The reaction graph framework showsa clear improvement over the

Magpie baseline. Removing the action context does not have a notable eﬀect on model accuracy.Reaction Representation Alchemy Subset F1 Score Mean Mean R Mask Accuracy L1 L2

Magpie

Baseline 0.625 0.902Reaction Graph without Actions 0.773 0.950Complete Reaction Graph 0.770 0.948Null Baseline (cid:88)

Magpie

Baseline (cid:88) (cid:88) (cid:88) fect. Thus a model that includes more granular process-ing procedures is required for practical applications.Despite this, when the models were extended to thefull dataset by including reactions with up to ten pre-cursors, there was a signiﬁcant improvement in accuracythrough incorporating the processing actions sequences.An L1 error of 0.1357 was achieved without the actions,and an L1 error of 0.1259 with the actions. This is likelydue to the greater diversity of reactions in the extendeddataset, meaning the actions are more important in pro-viding contextual information.

CONCLUSION

In this work, a data-driven approach to inorganic re-action prediction has been investigated for the ﬁrst time.A physically-motivated model has been developed whichcan predict major products of solid-state reactions fromprecursors and synthesis procedures. The key feature ofthis method is the generation of a learned reaction em-bedding which eﬀectively encodes the reaction contextfor product prediction. Through an ablation study, thisframework is shown to predict product stoichiometriesmore accurately than less physically-motivated models.Importantly, it also more reliably assesses the uncer-tainty in its predictions.The learned reaction representations have been shownto provide explainability to novel predictions throughanalysing similarities with known reactions in the lit-erature. These embeddings can also be transferred tofurther problems. One can envisage using it to predictthe probability of success of a given synthesis proce-dure. This would further aid retrosynthesis; precursorsand synthesis pathways which have the highest proba-bility of success can be preferentially chosen. This wouldrequire supervision in the form of both successful andunsuccessful reactions [25].This work provides a basis for evaluating a future ret-rosynthesis planner for inorganic materials. For practi-cal use, however, further work is required to diﬀerenti- ate between morphologies of products and more gran-ular synthesis action steps with relevant conditions areneeded. Access to greater quality and quantity of syn-thesis data and a concerted eﬀort between ML practi-tioners and experimentalists will allow these advances.

METHODSData Processing

Reactions with non-stoichiometric or organic precur-sors and targets, and reactions with pure products oronly one precursor were removed from the dataset. Thisreduced the dataset size to 16,231 reactions with upto ten precursors, and 11,083 reactions for the ablationstudy with up to three precursors. This was randomlysplit into training and test sets with an 80:20 ratio.Processing action sequences ranged from 0 to 16 stepsin length, and were represented by 7 unique actiontypes: {Dry Mixing, Mixing in solution, Quenching,Shaping, Heating, Drying, Liquid Grinding}.Precursor stoichiometries were extracted along withtheir molar ratios from the balanced chemical equationfor each reaction. Where precursors or targets werethemselves a mixture of diﬀerent materials/phases, theirstoichiometries were added, weighted by their amounts.Targets were extracted as 81-dimensional stoichio-metric vectors. The set of elements present in the pre-cursors was used to construct the alchemy mask . Implementation Details

The model was implemented in PyTorch [26] and

Mat-miner was used for generating

Magpie features fromprecursor stoichiometries [27].An embedding dimension of 8 and a hidden state di-mension of 32 were used for the processing actions au-toencoder. This lead to a sequence reconstruction accu-racy of ∼ . Reconstruction accuracies of over 95%were achieved using a hidden dimension of 64 but didnot signiﬁcantly improve product prediction accuracy.The learnable precursor embedding size was set to128, and 5 message-passing layers were used, each with3 attention heads. The hidden layer dimension was cho-sen to be 256 for both f and g . The output network ofthe product element prediction model had hidden layerdimensions of 256, 512, 512 and 256 respectively withReLU activation. The threshold output probability forelemental presence was chosen through maximising ac-curacy on a validation set. The stoichiometry predictionmodel had hidden layer dimensions of 256, 256, 256,256, 128, 128 and 64 respectively with ReLU activation.These were averaged over 5 iterations. Skip connectionswere used to reduce the vanishing gradient eﬀect [28].Learning rates were chosen using a heuristic searchwhere we choose a rate one magnitude lower than thatwhich causes the training loss to diverge. A mini-batch size of 256 and the Adam optimiser was used totrain the model [29]. Early stopping using an 80:20train:validation split was used in all models to avoidover-ﬁtting.The ensembles used to estimate the epistemic un-certainty in both the Magpie baseline and reactiongraph models were constructed using 5 individual mod-els trained on the same data with diﬀerent initialisation.

Data availability

The exact data used in this work can be found at https://doi.org/10.6084/m9.figshare.9722159.v3 . More recent versions of the synthesis dataset arereleased by the original authors at https://github.com/CederGroupHub/text-mined-synthesis_public . Code availability

An open-source repository of the code used in thiswork is available at https://github.com/s-a-malik/inorg-synth-graph . ACKNOWLEDGEMENTS

R.E.A.G. and A.A.L. acknowledge the support of theWinton Programme for the Physics of Sustainability. ∗ Correspondence email address: [email protected][1] V. D’Innocenzo, A. R. Srimath Kandada, M. De Bas-tiani, M. Gandini, and A. Petrozza,

Journal of theAmerican Chemical Society , Journal of the AmericanChemical Society , 17730 (2014). [2] K. Kalantari, I. Sterianou, S. Karimi, M. C. Ferrarelli,S. Miao, D. C. Sinclair, and I. M. Reaney, AdvancedFunctional Materials , 3737 (2011).[3] Y. Gong, Z. Liu, A. R. Lupini, G. Shi, J. Lin, S. Naj-maei, Z. Lin, A. L. Elías, A. Berkdemir, G. You, H. Ter-rones, M. Terrones, R. Vajtai, S. T. Pantelides, S. J.Pennycook, J. Lou, W. Zhou, and P. M. Ajayan, NanoLetters , Nano Letters , 442 (2014).[4] A. Jain, S. P. Ong, G. Hautier, W. Chen, W. D.Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner,G. Ceder, and K. A. Persson, APL Materials , 011002(2013), https://doi.org/10.1063/1.4812323.[5] J. Saal, S. Kirklin, M. Aykol, B. Meredig, andC. Wolverton, JOM , 1501 (2013).[6] M. Jansen, Advanced Materials , Advanced Materials ,3229 (2015).[7] A. Ludwig, npj Computational Materials , 70 (2019).[8] M. G. Kanatzidis, Inorganic Chemistry , InorganicChemistry , 3158 (2017).[9] M. Aykol, V. I. Hegde, L. Hung, S. Suram, P. Herring,C. Wolverton, and J. S. Hummelshøj, Nature Commu-nications , 2018 (2019).[10] Y. Kim, E. Kim, E. Antono, B. Meredig, andJ. Ling, “Machine-learned metrics for predicting thelikelihood of success in materials discovery,” (2019),arXiv:1911.11201 [cond-mat.mtrl-sci].[11] O. Kononova, H. Huo, T. He, Z. Rong, T. Botari,W. Sun, V. Tshitoyan, and G. Ceder, Scientiﬁc Data , 203 (2019).[12] E. Kim, Z. Jensen, A. van Grootel, K. Huang, M. Staib,S. Mysore, H.-S. Chang, E. Strubell, A. McCallum,S. Jegelka, and E. Olivetti, Journal of Chemical Infor-mation and Modeling , 1194 (2020), pMID: 31909619,https://doi.org/10.1021/acs.jcim.9b00995.[13] L. Ward, A. Agrawal, A. Choudhary, and C. Wolverton,npj Computational Materials , 16028 (2016).[14] V. Tshitoyan, J. Dagdelen, L. Weston, A. Dunn,Z. Rong, O. Kononova, K. A. Persson, G. Ceder, andA. Jain, Nature , 95 (2019).[15] I. Sutskever, O. Vinyals, and Q. V. Le, CoRR abs/1409.3215 (2014), arXiv:1409.3215.[16] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit,L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin,“Attention is all you need,” (2017), arXiv:1706.03762[cs.CL].[17] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova,“Bert: Pre-training of deep bidirectional transformersfor language understanding,” (2018), arXiv:1810.04805[cs.CL].[18] S. Hochreiter and J. Schmidhuber, Neural Comput. ,1735 (1997).[19] R. E. A. Goodall and A. A. Lee, “Predicting mate-rials properties without crystal structure: Deep rep-resentation learning from stoichiometry,” (2019),arXiv:1910.00617 [physics.comp-ph].[20] P. Veličković, G. Cucurull, A. Casanova, A. Romero,P. Liò, and Y. Bengio, “Graph attention networks,”(2017), arXiv:1710.10903 [stat.ML].[21] R. Gover, A. Bryan, P. Burns, and J. Barker, SolidState Ionics , 1495 (2006).[22] Y. Tokura, A. Urushibara, Y. Moritomo, T. Arima,A. Asamitsu, G. Kido, and N. Furukawa, Journalof the Physical Society of Japan , 3931 (1994),https://doi.org/10.1143/JPSJ.63.3931. [23] I. S. Debbebi, H. Omrani, W. Cheikhrouhou-Koubaa,and A. Cheikhrouhou, Journal of Physics and Chem-istry of Solids , 67 (2018).[24] H. Zhou, X. Zhang, B. Xie, Y. Xiao, C. Yang, Y. He,and Y. Zhao, Thin Solid Films , 338 (2006).[25] P. Raccuglia, K. C. Elbert, P. D. F. Adler, C. Falk, M. B.Wenny, A. Mollo, M. Zeller, S. A. Friedler, J. Schrier,and A. J. Norquist, Nature , 73 (2016).[26] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury,G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga,A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Rai-son, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang,J. Bai, and S. Chintala, in Advances in Neural In-formation Processing Systems 32 , edited by H. Wal- lach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc,E. Fox, and R. Garnett (Curran Associates, Inc., 2019)pp. 8024–8035.[27] L. Ward, A. Dunn, A. Faghaninia, N. Zimmermann,S. Bajaj, Q. Wang, J. Montoya, J. Chen, K. Bystrom,M. Dylla, K. Chard, M. Asta, K. Persson, G. Snyder,I. Foster, and A. Jain, Computational Materials Science , 60 (2018).[28] K. He, X. Zhang, S. Ren, and J. Sun, in2016 IEEEConference on Computer Vision and Pattern Recogni-tion (CVPR)