[PDF] Critical Temperature Prediction for a Superconductor: A Variational Bayesian Neural Network Approach

Abstract

Much research in recent years has focused on using empirical machine learning approaches to extract useful insights on the structure-property relationships of superconductor material. Notably, these approaches are bringing extreme benefits when superconductivity data often come from costly and arduously experimental work. However, this assessment cannot be based solely on an open black-box machine learning, which is not fully interpretable, because it can be counter-intuitive to understand why the model may give an appropriate response to a set of input data for superconductivity characteristic analyses, e.g., critical temperature. The purpose of this study is to describe and examine an alternative approach for predicting the superconducting transition temperature T c from SuperCon database obtained by Japan's National Institute for Materials Science. We address a generative machine-learning framework called Variational Bayesian Neural Network using superconductors chemical elements and formula to predict T c . In such a context, the importance of the paper in focus is twofold. First, to improve the interpretability, we adopt a variational inference to approximate the distribution in latent parameter space for the generative model. It statistically captures the mutual correlation of superconductor compounds and; then, gives the estimation for the T c . Second, a stochastic optimization algorithm, which embraces a statistical inference named Monte Carlo sampler, is utilized to optimally approximate the proposed inference model, ultimately determine and evaluate the predictive performance.

Full PDF

aa r X i v : . [ phy s i c s . d a t a - a n ] J a n IEEE TRANSACTIONS ON APPLIED SUPERCONDUCTIVITY: ACCEPTED FOR PUBLICATION 1

Critical Temperature Prediction for a Superconductor:A Variational Bayesian Neural Network Approach

Thanh Dung Le,

Member, IEEE , Rita Noumeir,

Member, IEEE , Huu Luong Quach, Ji Hyung Kim,Jung Ho Kim, and Ho Min Kim,

Member, IEEE

Abstract —Much research in recent years has focused onusing empirical machine learning approaches to extract usefulinsights on the structure-property relationships of superconduc-tor material. Notably, these approaches are bringing extremebeneﬁts when superconductivity data often come from costlyand arduously experimental work. However, this assessmentcannot be based solely on an open black-box machine learning,which is not fully interpretable, because it can be counter-intuitive to understand why the model may give an appropriateresponse to a set of input data for superconductivity characteristicanalyses, e.g., critical temperature. The purpose of this study isto describe and examine an alternative approach for predictingthe superconducting transition temperature T c from SuperCondatabase obtained by Japan’s National Institute for MaterialsScience. We address a generative machine-learning frameworkcalled Variational Bayesian Neural Network using superconduc-tors chemical elements and formula to predict T c . In such acontext, the importance of the paper in focus is twofold. First, toimprove the interpretability, we adopt a variational inference toapproximate the distribution in latent parameter space for thegenerative model. It statistically captures the mutual correlationof superconductor compounds and; then, gives the estimationfor the T c . Second, a stochastic optimization algorithm, whichembraces a statistical inference named Monte Carlo sampler, isutilized to optimally approximate the proposed inference model,ultimately determine and evaluate the predictive performance.As a result, in comparison with the standard evaluation metrics,the results are promising and also agree with the existing modelsprevalent in the ﬁeld. The R value obtained is very close to thebest model (0.94), whereas a considerable improvement is seen inthe RMSE value (3.83 K). Notably, the proposed model is knownas the ﬁrst of its kind for predicting a superconductor’s T c . Index Terms —Critical transition temperature, machine learn-ing, Bayesian neural network, variational inference, stochastic op-timization algorithm, high temperature superconducting (HTS).

I. I

NTRODUCTION T HE generality of machine learning (ML) in materialscience is increasingly being adopted to discover hidden

Manuscript received Sep 24, 2019; accepted Jan 29, 2020. Date of publica-tion Jan 29, 2020; date of current version Jan 29, 2020. The author (T. D. Le)acknowledges the ﬁnancial support of the Canada First Research ExcellenceFund (CFREF) program through IVADO, and the Doctoral Scholarship forInternational Student from Le Fonds de Recherche du Quebec Nature ettechnologies (FRQNT).T. D. Le and R. Noumeir are with the Biomedical Information ProcessingLab, Ecole de Technologie Superieure, Montreal, QC H3C 1K3, Canada(Corresponding email: [email protected]).H. L. Quach, J. H. Kim, and H. M. Kim are with the Applied Super-conducting Lab, Jeju National University, Jeju-si 690-756, S. Korea (Email:{qhluong, jihkim, hmkim}@jejunu.ac.kr).J. H. Kim is with the Institute for Superconducting & Electronic Materi-als, Australian Institute of Innovative Materials, University of Wollongong,Wollongong NSW 2522, Australia (Email: [email protected]). trends in data and make predictions. Mainly, there are guidanceand perspectives when applying ML techniques as a robustprotocol to maintain both quantitatively and qualitatively pre-dictive models [1]; impacts of ML technologies on materials,process and structures engineering are likely transformationalfor advancing new solutions to the long-standing data struc-ture challenge [2]; reliable and explainable ML models fromunderrepresented materials data provide both model-level anddecision-level explanations [3]. Besides, there are also wideranges of ML applications in material data science, such asillustrative examples of a taxonomy of ML capabilities in softmater, and data-driven materials design engines [4].In superconductivity, machine learning-guided iterativeexperimentation may outperform standard high-throughputscreening for discovering breakthrough materials in high tem-perature (high- T c ) superconductors, e.g., a new measure ofmachine learning model performance in high- T c by improvingthe cross-validation with a single neural network [5], empiricalanalyses for critical current measurement by developing ma-chine learning tool in classiﬁcation and regression tasks for Su-perCon database [6]. These studies conﬁrm that the frameworkfor making machine-based decisions and actions using MLanalysis has dominated the predictive models. However, thereare growing concern steps for attaining autonomous predictionthat require at least three concurrently operating technologies:i) making analyses by endorsing perception of information,ii) predicting the sensed ﬁeld changing over time, and iii)establishing a policy for a machine to take unsupervised action.To address the challenges, we describe an alternative MLapproach called the generative neural network model. Also,Bayesian-based generative model prediction is advantageousfor the two most important reasons: i) uncertainty is in-trinsically described, useful for analysis, and prediction, ii)overﬁtting is avoided by natural penalization of overly com-plicated models. In this work, we develop the probabilistichigh- T c predictive model using Variational Bayesian NeuralNetwork (VBNN) regression provided by Drugowitsch [7] andbuild upon the efﬁcient optimization algorithm for learningoptimal learnable parameters in the VBNN. In particular, weexploit the Stochastic Gradient Variational Bayes (SGVB) op-timization algorithm introduced in [8]. The learning algorithmprovides the probability that latent correlation of parametersin superconductors chemical characteristics in drawing the T c prediction, instead of a ﬁxed linear regression with uncertainty.The rest of this paper is organized as follows. SectionII discusses strength, weaknesses, and achievements of therelated work, and Section III presents the mathematical un- EEE TRANSACTIONS ON APPLIED SUPERCONDUCTIVITY: ACCEPTED FOR PUBLICATION 2 derpinning of the VBNN model and preliminary backgroundsof an inference model. Section IV provides a conceptual T c prediction method by applying the VBNN model and pre-diction evaluation. In Section V, we numerically evaluate itsperformance. Finally, Section VI provides concluding remarks.II. R ELATED W ORK

One of the major impediments to the industrial take-upof high-temperature superconductors is the paucity of com-prehensive, reliable, and relevant performance data on com-mercially available wires such as wire performance databasefrom the Robinson Research Institute [9], SuperCon databasemaintained by Japan’s National Institute for Materials Science[10]. To address this, the article ﬁrst reviews the data analyticapproaches on critical temperature prediction, which is fol-lowed by a review of machine learning-based prediction mod-els. For each study, the present paper discusses the strength,weaknesses, and achievements of different approaches, so asto better motivate the technique proposed in this work.Recently, an active collaboration of Center for Nanophysicsand Advanced Material members, together with researchersfrom the National Institute of Standards and Technology andDuke University, has been exploring methods by developingmachine learning schemes to model the T c of over 12,000known superconductors published by Nature [6]. All machinelearning in this work are variants of the random forest method.Random forest is used in this work because of several advan-tages, such as i) can learn complicated non-linear dependenciesfrom the data, ii) quite tolerant to heterogeneity in the trainingdata , iii) can estimate the importance of each tree predictors,thus making the model more interpretable. The results showthat a single regression model combined with a backwardfeature elimination process did a reasonably well; the modelachieved statistical measure R -squared ( R ) ≈ . .The study [11] also applies random forest to predict thesuperconducting critical temperature based on the features ex-tracted from the superconductor’s chemical formula. However,this study enhanced the random forest by adapting XGboost(eXtreme Gradient Boosting) approach. The gradient boostedmodels create an ensemble of trees to predict a response.The trees are added sequentially to improve the model byaccounting for the points which are difﬁcult to predict. Then,gradient boosted models can handle the complex interactionsand correlations among the features. Consequently, the regres-sion model serves as a benchmark with 17.6 K and 0.74 forthe out-of-sample root mean square error (RMSE) and R .Early, the work [12] directly relates the lattice parametersto the T c through computational intelligence technique usingsupport vector machine (SVM) regressor. The success ofthe model paves a signiﬁcant way for quick and accurateestimation of T c of doped YBCO superconductors, and eventu-ally eases the usual high demanding experimental proceduresthat involve the use of expensive cryostat. Technically, SVMemploys a mapping function called kernel function to mapnon-linear regression to high dimensional feature space wherelinear regression is conducted. Then, they will ﬁnd the optimalvalue by tuning the training of all parameters from the SVM problem. The results of the modeling and simulations indicatethat the developed approach was capable of estimating the T c with a high degree of accuracy as can be deduced from highcoefﬁcients of correlation of 96.65% and 95.75% during thetraining and testing periods of the model, respectively.The recent exciting study [13] holds a promising tech-nique to predict new high-temperature superconductors. Theauthor introduced a novel approach using deep learning andsucceeded in making a new list of candidate materials ofsuperconductors. They proposed a new approach “material asimages in periodic table.” Technically, elements and materialsare presented as images in the periodic table because numberof electrons and their orbit determine the properties of theelements. Then, a deep convolutional neural network (CNN) isadapted for images from the periodic table with four channelscorresponding to s -block, p -block, d -block, and f -block. Theresults conﬁrmed the accuracy 94%, precision 74%, the recall62% and f score is 67% with the SuperCon data. Theachieved R of the critical temperature in the test data is 0.93.To conﬁrm the promise of the study [13], another studyrecently published in Nature [14] develops an atom table CNNatom table CNN learning the experimental properties directlyfrom the features constructed by itself. In particular, CNN onlyrequires the component information. Through data-enhancedtechnology, their model not only accurately predicts super-conducting transition temperatures, but also distinguishes su-perconductors and non-superconductors. Utilizing the trainedmodel, they have screened 20 compounds that are potentialsuperconductors with high superconducting transition temper-ature from the existing database SuperCon database. Besides,from the learned features, they extract the properties of theelements and reproduce the chemical trends. In the test setperformance, the RMSE and the coefﬁcient of determination R are 8.14 K and 0.97, respectively.Although studies [6], [11] and [12] show signiﬁcant achieve-ment in predicting the T c , each method still has limitations.For random forests, the main disadvantage is their complexity,more computational resources, less intuitive. It can quicklybecome inconsistent when the commonly used setups forrandom forests can be inconsistent. In addition, the GXBosstmodel is more sensitive to overﬁtting if the data is noisy.Consequently, with both approaches, the training generallytakes longer because trees are built sequentially. Besides theadvantages of SVMs, an important practical question, whichis not entirely solved, is the selection of the kernel functionparameters, high algorithmic complexity, and extensive mem-ory requirements of the required quadratic programming.To overcome the remaining challenges, this paper presents aconceptual illustration of a generative Bayesian-based model.There are advantages of Bayesian model selection as follows: • Avoid the overﬁtting associated with the ML bymarginalizing over the model parameters instead ofmaking point estimates of their values. • Models can be compared directly on the training data,without the need for a validation set. • Avoids the multiple runs for training each model asso-ciated with cross-validation.

EEE TRANSACTIONS ON APPLIED SUPERCONDUCTIVITY: ACCEPTED FOR PUBLICATION 3

Consequently, the proposed approach provides a more com-prehensive and realistic picture of ML model performancein material discovery applications. Mainly, it can outperformdeep learning approaches, from the study [13] and [14], whichare claimed to be failed because of their function distributionsof low cross-predictability with a descent algorithm [15].III. M

ODEL AND P RELIMINARIES

A. Bayesian Linear Regression Model

Let us start with a simple linear regression function toapproximate a true generating function such that: y i = w T x i + ǫ i (1)where the response y i is a linear function of the covariate x i ∈ R D and is linear in the parameters w , additional bias ǫ as well. Collecting I response variable Y, ǫ ∈ R I we have: Y = XW + ǫ (2)where X ∈ R I × D , W ∈ R D × N is referred to as the designmatrix and weight matrix, respectively. The weight matrix isassumed the only parameter θ = { W } . Then, the empiricalcost function is as follows: ˜ C ( θ ) = 1 N N X i =1 (cid:13)(cid:13) y i − W T x i (cid:13)(cid:13) (3)The gradient is calculated then to ﬁnd the minimum of em-pirical cost function from Eq. 3, by the following expression: ∇ ˜ C ( θ ) = − N N X i =1 (cid:0) y i − W T x i (cid:1) T x i (4)We can use Eq. 3 and 4 with an iterative optimization algo-rithm, such as gradient descent or stochastic gradient descent,to ﬁnd the best W that minimizes the empirical cost function,as in the study [12]. Even though a better option is to use a val-idation set that can stop the optimization algorithm when theminimal validation cost function is reached. However, thesemethods demand clariﬁcation of a linear network because: • First, there is no guarantee whether the real generatingfunction f is a linear function. If it is not, the linearregression model cannot be expected to approximate thetrue function well. • Second, there is not much control over what expectationsto measure the given input data x . Therefore, how well x represents the input remains unclear.Now, given a training set D = { X, Y } , estimate w so thatthe response y ∗ to a new data point x ∗ can be predicted by cal-culating the expectation E [ y ∗ | x ∗ ] as given E [ y ∗ | x ∗ ] = w T x ∗ .To do that, we develop a probabilistic graphical model of theBayesian linear regression model. Then, our target ﬁnds thelikelihood function for w and the prior over w , which is givenby [7]: p ( y | X , w ) = I Y i =1 N (cid:0) y i | w T x i , λ − (cid:1) (5) where, λ is the noise precision parameter and is assumed tobe known for simplicity. Thus, the joint distribution over allthe variables is given by the following factorization. p ( y , w | X ) = p ( y | X , w ) p ( w ) (6)The whole procedure of learning given by Eq. 6 is a processof searching for the best hypothesis over the entire space H of hypotheses. It is assumed that each hypothesis correspondsto each possible function with a unique set of parameters anda unique functional form, and that hypothesis only takes theinput x and the output y . B. Variational Inference for Bayesian Neural Network

We can re-write Eq. 6 again with the only parameter θ forthe weight matrix W as the following equation. Then, theposterior inference over w given by Eq. 7 is often intractable,especially, because of the divisor. p θ ( w | x ) = p θ ( x | w ) p ( w ) p θ ( x ) = p θ ( w, x ) p θ ( x ) = p θ ( w, x ) R w p θ ( x, w ) (7)Therefore, we assume that there is a tractable family ofdistribution Q , which is similar to p θ ( x | w ) . Then, we try toﬁnd an approximate posterior inference using q φ ( q φ ∈ Q ).Hence, the optimization objective must measure the similaritybetween p θ and q φ . To capture this, we use the Kullback-Leibler (KL) divergence as given by: KL ( q φ k p θ ) = Z w q φ ( w | x ) log q φ ( w | x ) p θ ( w | x ) (8)Because we cannot minimize the KL-Divergence directly,we have isolated the intractable evidence term in KL-Divergence: KL ( q φ k p θ ) = (cid:18) E q φ log q φ ( w | x ) p θ ( w, x ) (cid:19) + log p θ ( x )= −L ( x ; θ, φ ) + log p θ ( x ) (9)Then, let’s us rearrange terms to express isolated intractableevidence: log p θ ( x ) = KL ( q φ k p θ ) + L ( x ; θ, φ ) Furthermore, KL-Divergence is non-negative, it is easily toexpressed as follows: log p θ ( x ) = KL ( q φ k p θ ) + L ( x ; θ, φ )log p θ ( x ) ≥ L ( x ; θ, φ ) where L ( x ; θ, φ ) = − E q φ log q φ ( w | x ) p θ ( w, x ) (10)The Eq. 10 is also called the Evidence Lower Bound(ELBO). Let’s us expand the derived variational lower bound,we will have then: L ( x ; θ, φ ) = − E q φ (cid:20) log q φ ( w | x ) p θ ( w, x ) (cid:21) = E q φ [log p θ ( x | w ) + log p ( w ) − log q φ ( w | x )] EEE TRANSACTIONS ON APPLIED SUPERCONDUCTIVITY: ACCEPTED FOR PUBLICATION 4

C. Optimization

The objective is to optimize the ELBO for the derived in-ference model, or it can be restated as the following equation: L ( x ; θ, φ ) = E q φ [log p θ ( x | w )] | {z } Reconstruction likelihood − KL ( q φ ( w | x ) k p ( w )) | {z } divergence from prior (11)Then, the gradients ∇ θ L and ∇ φ L need to be computed.To achieve that, we apply the Stochastic Gradient VariationalBayes (SGVB) approach given by [8]. Technically, the keyof SGVB estimator is a reparameterization trick, i.e., theyreparameterize the random variable, as given: w ∼ q φ ( w | x ) = N (cid:0) w | µ w ( x ; φ ) , σ w ( x ; φ ) (cid:1) as w = w ( ǫ ; x, φ ) = ǫσ w ( x ; φ ) + µ w ( x ; φ ) , ǫ ∼ N (0 , I ) Then, the expectation can be written with respect to ǫ : L ( φ, θ ) = E w ∼ q φ ( w | x ) [log p θ ( x, w ) − log q φ ( w | x )]= E ǫ ∼ N (0 ,I ) [log p θ ( x, w ( ǫ ; x, φ )) − log q φ ( w ( ǫ ; x, φ ) | x )] Consequently, the gradient with variational parameter φ canbe directly moved into the expectation, enabling an unbiasedlow variance Monte Carlo estimator: ∇ φ L ( φ, θ ) = E ǫ ∼ N (0 ,I ) ∇ φ [log p θ ( x, w ( ǫ ; x, φ )) − log q φ ( w ( ǫ ; x, φ ) | x )] ≈ k k X i =1 ∇ φ [log p θ ( x, w ( ǫ i ; x, φ )) − log q φ ( w ( ǫ i ; x, φ ) | x )] where ǫ i ∼ N (0 , I ) IV. C

RITICAL T EMPERATURE P REDICTIVE M ODEL

A. High-Tc Data

Although there are many public data available for supercon-ductors [9], [10], the present study used only the SuperCondatabase. We will restate the reﬁned data from the study [11]because of signiﬁcant reasons. First, the material investigatedis a Standardized Data for Typical Oxide High- T c materials;all preparation, characterization is captured with i) largeramount dataset, ii) a more substantial number of features fromelemental properties, iii) freely available access for everyone,and iv) compatibility as a performance benchmark. Besides,we can assess the importance of the features, which arebased on thermal conductivity, atomic radius, valence, electronafﬁnity, and atomic mass in prediction accuracy for T c .Studies [6], [11]–[14] also create a model to predict T c from the SuperCon data. Our approach is different from thosestudies in the following ways: (i) We use generative neuralnetwork as illustrated in Fig. 1, that probabilistic analysesand statistical learning theories are utilized to tune the learn-able hyper-parameters, and (ii) most importantly, the modelpromises to discover rich structure (latent and distributionalformation) in superconductor’s chemical formula data whilegenerating realistic data distribution from a latent code space.Then, the nature of the relationship between the features and T c can be statistically inferred from the model. Fig. 1. Probabilistic graphical model of the VBNN model to predict T c . B. High-Tc Prediction

The model, in previous section III, have been deﬁned toinfer the parameters. Next, the main target is to predict aboutnew data. As consequence, the probability distribution of newdata y = T c given its input feature x and our training data D is deﬁned as follows: p ( y | x, D ) = Z w p ( y | x, w ) p ( w | D ) Because we have learned the approximation of p θ ( w | D ) bythe variational q φ ( w ) in Eq. 11. Therefore, we can use theMonte Carlo estimation to get an unbiased estimate of it bysampling from the variational posterior as given by: p ( y | x, D ) ≃ M M X i =1 p ( y | x, w ) As a result, the prediction for new superconductor data isthe mean of the predictive distribution as expressed by: ˆ y = E p ( y | x,D ) y ≃ M M X i =1 E p ( y | x,w ) y (12) C. Predictive Model Evaluation

The most common technique for model validation is RMSE, R and log-likelihood. RMSE is the square root of thepredictive mean square error, and the smaller RMSE means,the better predictive accuracy is: RM SE = vuut N N X i =1 (ˆ y i − y i ) (13) R values are commonly expressed as percentages from 0%to 100% (or its values range from 0 to 1). It approximates howwell the model’s input can explain the observed variation. R = 1 − P Ni =1 ( y i − ˆ y i ) P Ni =1 ( y i − ¯ y i ) (14) EEE TRANSACTIONS ON APPLIED SUPERCONDUCTIVITY: ACCEPTED FOR PUBLICATION 5

TABLE IN

UMERICAL R ESULT C OMPARISON

ML Approaches R RMSE (K)Random Forest [6] 0.85 N/ARandom Forest & XGboost [11] 0.74 17.6Support Vector Machine [12] 0.96 N/AConvolutional Neural Network [13] 0.93 N/AAtom Table Convolutional Neural Network [14]

V. N

UMERICAL R ESULTS AND D ISCUSSIONS

For the implementation, we use the Python-based Bayesiandeep learning library [16]. Then, the predictive model trainingand testing are executed by using the Tensorﬂow on anNVIDIA Tesla k80 GPU. There is no need for the valida-tion set to explore the effect of VBNN on overcoming theoverﬁtting challenge, then the dataset is solely divided into30%, 70% for test-set and train-set, respectively. Practically,it is problematic to determine a good network topology justfrom the number of inputs and outputs. Because accuracy is themain criteria for designing the VBNN, the hidden layers can beincreased [17]. In general, the neural network models improvewith more epochs of training, and the accuracy remains stableas they converge [18]. And, the larger batch sizes result infaster progress in training but do not always converge asfast. In contrast, the smaller batch sizes train slower but canconverge faster [19]. Besides, averaging over a multiplicitybatch of 10 is going to produce a gradient that is a morereasonable approximation of the full batch-mode gradient. Asa consequence, the experiment converged with the running of1000 epochs, the batch size of 10, and 100 hidden layers.For the reproducible analysis, predictions and evaluations,the implementation code and results are available at GitHubrepository (https://github.com/ltdung/VBNN_HighTc).The presented Bayesian regression approach can also di-rectly be applied to predict the critical temperature of asuperconductor, as shown in Table I. Our conﬁdence scores R have strong overall concordance with previous predictions( R = 0.94). Besides, a signiﬁcant improvement was obtainedin the RMSE at K. The result is a striking illustrationof VBNN performance compared with other techniques. Inshort, to the knowledge of the authors, the generative approachfor superconductors T c prediction is the ﬁrst of its kind. Ourresults are encouraging; however, reproducibility of replicatedexperiments should be conducted for worthy investigations: • First, an important question for future studies is to usethe pre-trained VBNN predictive model to validate itsperformance on different superconductor datasets. Pos-sible directions are customizing the “transfer learning”paradigm to take advantages of the optimized hyper-parameters from the VBNN neural network. • Second, future work should focus on exploring feasiblecompounds as a new supercondutor. It will be beneﬁcialin having an initial feedback to determine the correctnessand efﬁciency of alternative compounds before conduct-ing costly, effortfully experiments in real practice. VI. C

ONCLUSION

The material data science, speciﬁcally in superconductorexploring, is in the early stages of ML adoption. Thereis a growing number of single-use applications, but moreintelligible models are yet to be seen. In this work, we devel-oped a new probabilistic approach using variational Bayesianneural network for estimating the T c value of high-temperaturesuperconductors. Our results are in general agreement withexisting studies in T c predictive model. These preliminaryresults demonstrate the feasibility of using generative neu-ral network, which provides compelling, helpful evidenceto understand the underlying superconductivity physics. Thisﬁnding is promising and should be investigated with otheradvanced predictive models, which could eventually lead tothe discovery of new superconductors in future.R EFERENCES[1] N. Wagner and J. M. Rondinelli, “Theory-guided machine learning inmaterials science,”

Frontiers in Materials , vol. 3, p. 28, 2016.[2] D. M. Dimiduk et al. , “Perspectives on the impact of machine learn-ing, deep learning, and artiﬁcial intelligence on materials, processes,and structures engineering,”

Integrating Materials and ManufacturingInnovation , pp. 1–16, 2018.[3] B. Kailkhura et al. , “Reliable and explainable machine learning methodsfor accelerated material discovery,” arXiv preprint arXiv:1901.02717 ,2019.[4] A. L. Ferguson, “Machine learning and data science in soft materialsengineering,”

Journal of Physics: Condensed Matter , vol. 30, no. 4, p.043002, 2017.[5] B. Meredig et al. , “Can machine learning identify the next high-temperature superconductor? examining extrapolation performance formaterials discovery,”

Molecular Systems Design & Engineering , vol. 3,no. 5, pp. 819–825, 2018.[6] V. Stanev et al. , “Machine learning modeling of superconducting criticaltemperature,”

NPJ Computational Materials , vol. 4, no. 1, p. 29, 2018.[7] J. Drugowitsch, “Variational bayesian inference for linear and logisticregression,” arXiv preprint arXiv:1310.5438 , 2013.[8] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXivpreprint arXiv:1312.6114 , 2013.[9] S. C. Wimbush and N. M. Strickland, “A public database of high-temperature superconductor critical current data,”

IEEE Transactions onApplied Superconductivity , vol. 27, no. 4, pp. 1–5, 2016.[10] Japan’s National Institute for Materials Science,

SuperconductingMaterial Database (SuperCon) , 2019 (accessed February 3, 2019).[Online]. Available: https://supercon.nims.go.jp/index_en.html[11] K. Hamidieh, “A data-driven statistical model for predicting the criticaltemperature of a superconductor,”

Computational Materials Science , vol.154, pp. 346–354, 2018.[12] T. O. Owolabi et al. , “Application of computational intelligence tech-nique for estimating superconducting transition temperature of YBCOsuperconductors,”

Applied Soft Computing , vol. 43, pp. 143–149, 2016.[13] T. Konno et al. , “Deep learning of superconductors I: Estimation ofcritical temperature of superconductors toward the search for newmaterials,” arXiv preprint arXiv:1812.01995 , 2018.[14] S. Zeng et al. , “Atom table convolutional neural networks for an accurateprediction of compounds properties,”

NPJ Computational Materials ,vol. 5, no. 1, pp. 1–7, 2019.[15] E. Abbe and C. Sandon, “Provable limitations of deep learning,” arXivpreprint arXiv:1812.06369 , 2019.[16] J. Shi et al. , “ZhuSuan: A library for Bayesian deep learning,” arXivpreprint arXiv:1709.05870 , 2017.[17] K. G. Sheela and S. N. Deepa, “Review on methods to ﬁx numberof hidden neurons in neural networks,”

Mathematical Problems inEngineering , vol. 2013, 2013.[18] E. Hoffer, I. Hubara, and D. Soudry, “Train longer, generalize better:closing the generalization gap in large batch training of neural networks,”in

Advances in Neural Information Processing Systems , 2017, pp. 1731–1741.[19] L. Chen et al. , “The effect of network width on the performance of large-batch training,” in