Three-dimensional convolutional neural network (3D-CNN) for heterogeneous material homogenization
TThree-dimensional convolutional neural network (3D-CNN) forheterogeneous material homogenization
Chengping Rao a , Yang Liu a, ˚ a Department of Mechanical and Industrial Engineering, Northeastern University, Boston, MA 02115, USA
Abstract
Homogenization is a technique commonly used in multiscale computational science and engineering for pre-dicting collective response of heterogeneous materials and extracting effective mechanical properties. In thispaper, a three-dimensional deep convolutional neural network (3D-CNN) is proposed to predict the effectivematerial properties for representative volume elements (RVEs) with random spherical inclusions. The high-fidelity dataset generated by a computational homogenization approach is used for training the 3D-CNNmodels. The inference results of the trained networks on unseen data indicate that the network is capableof capturing the microstructural features of RVEs and produces an accurate prediction of effective stiffnessand Poisson’s ratio. The benefits of the 3D-CNN over conventional finite-element-based homogenizationwith regard to computational efficiency, uncertainty quantification and model’s transferability are discussedin sequence. We find the salient features of the 3D-CNN approach make it a potentially suitable alternativefor facilitating material design with fast product design iteration and efficient uncertainty quantification.
Keywords:
1. Introduction
The last few decades have seen tremendous applications of heterogeneous materials in auto-motive industry, civil, aerospace and mechanical engineering. These materials possess superiormechanical properties attributed to the unique architecture and complex microstructure. Mostcommon among these materials are concrete, alloys, polymers, reinforced composites, etc. A pri-mary assumption generally made for computational modeling of composite materials is that thesematerials are periodic in microscope scale and the periodic microstructures can be approximatedby representative elements (RVEs). To develop composite materials with unusual combination ofproperties, it is crucial to understand the effects of various characteristics of RVE (microstructure,constituent phase, volume fraction, etc.) on the macroscopic material properties.For most of composite design problems, effective material properties are used instead of takingall the constituents and microstructure into consideration. A lot of efforts have been devoted todeveloping mathematical and/or numerical approaches for calculating the effective/homogenizedmaterial properties. The homogenization theory, which was originally developed to study par-tial differential equations (PDEs) with rapidly oscillating coefficients [1], have been widely usedto describe the mechanics of periodic microstructure of composites. Numerous homogenization ˚ Corresponding author. Tel: +1 617-373-8560
Email addresses: [email protected] (Chengping Rao), [email protected] (Yang Liu)
Preprint submitted to Computational Materials Science February 19, 2020 a r X i v : . [ c s . C E ] F e b pproaches have been developed to calculate effective properties which can subsequently be usedfor macroscopic structural analysis. These approaches can be classified into three categories [2]:(1) Analytical methods, e.g., the Voigt and Reuss model [3, 4]; (2) Semi-analytical methods, e.g.,generalized method of cells (GMC) [5], self-consistent scheme (SCS) [6, 7], Mori-Tanaka method[8]; (3) Numerical methods, e.g., finite flement (FE) [9–14], boundary element (BE) [15, 16], fastfourier transforms (FFT) [17, 18]. Each of the aforementioned approaches has its pros and cons.For example, the Voigt and Reuss model provides a quick but rough upper and lower bounds forvarious properties of a heterogeneous material; however, the gap of the bounds grows with regardto the volume fraction (VF) of inclusions and degree of phase contrast [19]. Although the numericalmethods involve complicated discretizations and expensive computations, they offer a possibilityto deal with homogenization of materials with arbitrary microstructures and constitutive models.These methods have been shown to be effective to model multiscale material behavior in both linear[14, 15, 20] and nonlinear [9–12, 20–23] problems given the properly defined material constituentsand microstructure. However, when it comes to iterative computational design of composites withdesired properties, these numerical approaches are not suitable owing to the huge computationalcost [24] and high-dimensional sample space [25–27].With recent prevalence of data science, many machine learning (ML) approaches are appliedto material modeling, analysis and design. A novel framework named materials knowledge sys-tems (MKS) [28–30] was formulated to exploit the merits of both analytical and numerical ap-proaches. MKS has its theoretical rooted in statistical continuum mechanics theory [31] in whichthe structure-property linkage of the material is expressed as a polynomial series sum. Each termof the series is a product of local microstructure-related statistics and their corresponding physics-related (or influence) coefficients [28] which reflects the underlying knowledge of the localizationrelationship. The core of MKS is employing discrete Fourier transform (DFT) to calibrate thesecoefficients to the results obtained from finite element analysis (FEA). This framework is charac-terized with computational efficiency, data-driven property and remarkable accuracy in a variety ofworks [28–30, 32, 33]. There are also some other applications of ML approaches on computationalmaterials and mechanics. Fritzen and Kunc [24] proposed a two-stage data-driven homogenizationapproach for nonlinear solids. Lookman et al. [26] employed an active learning approach to navi-gate the search space for identifying the candidates for guiding experiments or computations. Thesurrogate model and utility function are used for selecting among the unexplored data.Traditional ML techniques rely largely on the feature engineering which is time-consuming andrequires expert knowledge [34]. Deep learning (DL) approaches have been developed to addressthis problem. Typical DL approaches, such as fully connected neural networks (FC-NN), con-volutional neural network (CNN) and long short-term memory (LSTM), can automatically findthe most salient features to be learned. These approaches have demonstrated tremendous successin a variety of applications such as speech recognition, computer vision (CV), natural languageprocessing (NLP), etc. They turned out to excel at discovering the intricate structures withinhigh-dimensional data [34]. Some of the recent applications of DL approaches on material scienceinclude material classification [35, 36], defect classification [37–39], microstructure identification[40, 41], microstructure reconstruction [42, 43], composite strength prediction [44], etc. In thispaper, we are mostly concerned with the works employing DL to address multiscale problems ofcomposites, particularly in the context of homogenization. For example, Lu et al. [45] adoptedneural networks (NN) to establish a surrogate model for electric conduction homogenization. Bysubstituting the RVE calculations with the data-driven model in multiscale modeling, a drastic2aving of computational cost (of the order of 10 ) was achieved compared with the FE method[9]. Le et al. [46] proposed a decoupled computational homogenization approach for nonlinearelastic materials using NN to approximate the effective potential. Li et al. [42] employed thetransfer learning idea on CNN for microstructure reconstruction. Bhattacharjee and Matouˇs [47]performed both homogenization and localization on heterogeneous hyperelastic materials using adigital database and the manifold-based nonlinear reduced order model (MNROM). The mappingbetween the macroscopic loading conditions and the reduced space are realized through NN. Yang et al. [48] applied generative adversarial networks (GAN) to generate microstructures with desiredmaterial properties. Cang et al. [49] implemented convolutional deep belief network (CDBN) toautomate a two-way conversion between microstructures and their lower-dimensional feature rep-resentations. Bostanabad et al. [50] adopted a supervised learning approach to characterize andreconstruct the stochastic microstructure.Most of the above studies are image-based and perform representation learning within a 2Dspace. To fully capture the salient features of the microstructure, the 3D geometry should beconsidered. Very recently, Yang et al. [51] showed the potential of three-dimensional CNN (3D-CNN) for effective elastic modulus homogenization for composites and demonstrated its advantagesover traditional sophisticated physics-inspired approaches. In this work, we leverage the capabilityof 3D-CNN and design a network architecture for predicting the effective material properties ofcomposites with complex heterogeneous microstructure. In particular, we consider the compositematerial whose microstructure can be modeled as a two-phase (matrix/inclusion) representativevolume element (RVE) with randomly distributed inclusions. A diverse group of RVEs, or virtualexperiment samples, have been created with different inclusion VFs and spatial distributions, sothat the sample space is large enough to include the intrinsic features of the material. Finiteelement analysis is then performed for each of the samples to obtain the effective moduli throughlinear homogenization. The geometric information of the RVEs have been pre-processed to a struc-tured (Euclidean) grid that the 3D-CNN can accept. The networks are then trained, verified andtested on synthetic data. The salient features of the proposed 3D-CNN approach include: (1) Itprovides an end-to-end solution for predicting the effective material properties of the compositeswith high efficiency and good accuracy given the geometric information of the corresponding RVEs;(2) It is able to reproduce the probability distribution of the material properties for the input char-acterized with uncertainty; and (3) Its transferability makes it extremely convenient while addingsupplementary data or training a model for new datasets that come from different microstructureconfigurations. It is worth noticing that the proposed 3D-CNN approach is more advantageous forheterogeneous materials with multiple constituents and extremely complex microstructure since ithas demonstrated extraordinary ability in handling high-dimensional inputs [51–54].The rest of the paper is organized as follows. Section 2 describes the proposed methodology.Specifically, generation of the training dataset (based on 2000 RVEs) is presented in Section 2.1.Some pre-processing procedures including the conversion of the raw data into the input format ofthe 3D-CNN model, computational homogenization approach to obtain the labels and rescalingof the labels are given. In Section 2.2, the basic concepts and mathematical operations involvedin the 3D-CNN are briefly introduce. Section 3 presents the numerical results. We first conducta series of parametric tests on the hyperparameters of the 3D-CNN to find an optimal networkarchitecture. Then a comparison between the 3D-CNN prediction and FEA result is made withregard to the accuracy and efficiency in Section 3.2. The benefits of the 3D-CNN approach overtraditional FEM are discussed. The uncertainty quantification (UQ) is conducted in Section 3.33 a) (b) Figure 1: (a) Geometry of RVE and (b) Generated phase voxel (point cloud). to evaluate the performance of current 3D-CNN model on the input with uncertainty. In Section3.4, the transferability of the proposed 3D-CNN model to a dataset representing a different typeof composite microstructure is investigated. Section 4 is devoted to conclusions of the paper andthe outlook of future work.
2. Methodologies
In this present study, we consider particle reinforced composites, e.g., metal matrix composites,whose microstructure can be represented by a parametric two-phase RVE model with a matrixphase and a particle phase. We generate 2000 RVE samples with the volume fraction (VF) ofinclusions ranging from 2% to 28% to establish the training data (see Fig. 1(a)). The radiusof each spherical inclusion follows a uniform distribution in the range of 0.05 „ ˆ ˆ p to the voxel with coordinate p x, y, z q , namely,4 N u m b e r o f s a m p l e s Volume fraction(%) Figure 2: Distribution of number of samples with respect of the inclusion volume fraction.Table 1: Material properties for RVE with spherical inclusions.
Materials Young’s modulus (GPa) Poisson’s ratioMatrix 68.9 0.33Inclusion 379.2 0.21 p p x, y, z q “
1; if a p x ´ x i q ` p y ´ y i q ` p z ´ z i q ă r i , D i P t , , ...n u
0; otherwise (1)where n is the total number of inclusions; x i , y i , z i and r i are coordinates of center and radius of i th spherical inclusion, respectively. It is noted that the size of 101 (length of 0.01 mm) is selectedin order to cover all the microstructural details within the RVEs since the minimum radius for thespherical inclusion is 0.05 mm.The deep learning method falls into the category of supervised learning in which training dataneeds to be labelled. In this paper, linear elastic materials are considered for both the matrixand inclusion phases. The material properties of each single phase used in this study are givenin Table 1. Since the considered composite is assumed to be orthotropic, its constitutive tensorhas 9 independent variables from which the following vector of effective material properties can beobtained: y “ “ E E E G G G ν ν ν ν ν ν ‰ T (2)where y denotes the label for each RVE sample; E ’s, G ’s and ν ’s denote the effective elastic modu-lus, shear modulus and Poisson’s ratio, respectively, along different directions. The computationalhomogenization is conducted based on the framework of the classical mathematical homogeniza-tion theory [20, 56] via FEM. Specifically, the homogenized constitutive tensor can be calculatedthrough averaging Σ mnij p ξ q over the entire volume Θ of the RVE, expressed as L ijmn “ | Θ | ż ξ P Θ Σ mnij p ξ q dξ (3)5n which Σ mnij p ξ q is the stress influence function with regard to the fine-scale coordinate ξ . It canbe interpreted as the fine-scale stress induced by an unit overall strain (cid:15) cmn . The implementationof numerical homogenization is achieved by solving a RVE (or unit cell) problem under periodicboundary conditions (PBCs) and unit thermal strain [20]. The components of constitutive tensorcan then be obtained by averaging the stress field over the volume, given by L ijmn “ | Θ | ż ξ P Θ σ mnij p ξ q dξ (4)The constitutive tensor C can be represented in the Voigt notation, written as »—————————– σ xx σ yy σ zz σ yz σ zx σ xy fiffiffiffiffiffiffiffiffiffifl “ »—————————– C C C C C C C C C C C
00 0 0 0 0 C fiffiffiffiffiffiffiffiffiffifllooooooooooooooooooooooomooooooooooooooooooooooon C »—————————– (cid:15) xx (cid:15) yy (cid:15) zz γ yz γ zx γ xy fiffiffiffiffiffiffiffiffiffifl (5)The inverse of C results in the so-called stiffness matrix S shown as follows, from which the vectorof effective material properties can be calculated. S “ C ´ “ »————————————– E ´ ν E ´ ν E ´ ν E E ´ ν E ´ ν E ´ ν E E G G
00 0 0 0 0 G fiffiffiffiffiffiffiffiffiffiffiffiffifl (6)The entire dataset is randomly divided into training, validation and testing set with a ratioof 1400:300:300. The training set is used for learning the parameters (i.e., weights and biases) ofthe 3D-CNN (see Section 2.2) while the validation set is used to tune the hyperparameters (i.e.,the architecture) of the 3D-CNN. The validation set is also adopted as a regularizer via earlystopping, i.e., to stop the training when the loss function on the validation set increases, as it isa sign of overfitting to the training data set [57]. The testing set, which is usually unseen to thetraining process, serves for confirming and evaluating the actual predictive power of the traineddeep learning model.Since the RVEs in this paper are generated artificially, we can directly extract the microstructureinformation from the formatted data. However, how to obtain the phase information of samplesfrom field measurements is an issue of interest. The nondestructive imaging techniques such asX-ray micro-topography [58–60], 3-D atom probe [61] and automated serial sectioning [62] havemade possible to capture 3D material microstructures. These imaging techniques are characterizedwith high resolution. For example, the synchrotron radiation micro-tomography is able to sample6 Filter: 5x5x5@16 Filter: 5x5x5@32 Flattening
Fully Connected Layer 𝐸𝐸 𝐸𝐸 𝐸𝐸 𝐺𝐺 𝐺𝐺 𝐺𝐺 𝜈𝜈 𝜈𝜈 𝜈𝜈 𝜈𝜈 𝜈𝜈 𝜈𝜈 Conv + ReLU Conv + ReLU
Pooling: 2x2x2
Pooling: 2x2x2
Figure 3: Proposed 3D-CNN architecture for effective properties prediction of heterogeneous materials. × Phase voxel (21×21×21)
Feature map (6 × × d ot product = Filter (5 × × stride=3 Figure 4: Convolution operation in the 3D-CNN model. microstructure with resolution of 2048 voxels in each dimension [63]. Therefore, it will be promisingfor field measurement techniques to be incorporated into current framework with appropriate down-sampling on the microstructure data. Nevertheless, this is beyond the scope of the current study.
The convolutional neural network (CNN or ConvNet) is proposed originally to solve computervision problems. LeCun et al. [64] designed one of the very first CNNs to successfully recognizehandwritten digits in 1990s. The applications of CNNs were limited by the less powerful compu-tational ability at that time. In recent years, the CNN approach has been revived owing to thehuge advancements on computational hardware such as the general purpose graphics processingunits (GPUs). The CNN differs from the classical FC-NNs by its weights sharing mechanism. Inthis study, we propose a 3D-CNN architecture (see Fig. 3) for inferring homogenized/effective ma-terial properties (e.g., elastic moduli, shear moduli and Poisson’s ratio) from given microstructureconfigurations (e.g., discretized distribution of material phases).The 3D-CNN takes the preprocessed phase voxels as the input. Subsequent multiple convolu-tional layers serve as the critical composition of the CNN with 3D convolution filters and poolingoperation. As indicated in Fig. 4, the 3D filter scans over the phase voxels and applies convolutional7 ax (1~8) (a) Before max pooling (b) After max pooling Figure 5: Max pooling operation in the 3D-CNN model. operation (dot product of tensor) to produce the feature map. The weights and biases of each filterare trained to extract the salient features from the input. Stride, padding and filter size are a fewcommon hyperparameters defining convolutional operations. Stride denotes the size of step thatfilters move each time. For instance, the stride length of 1 means the filters scan the volume voxelby voxel. To preserve the spatial size of the output, it is convenient to pad the input with zero-valuevoxels. A good example is that the input and output size in Fig. 4 will be identical (21 ˆ ˆ
21) ifthe convolution operations are conducted with stride of 1 and 2-layer zero padding. Pooling layersare usually added between successive convolutional layers in the CNN. It progressively reduces thespatial size of data through down-sampling the voxel value. Pooling operations may compute themaximum or average value within a volume. Fig. 5 demonstrates how the max-pooling operationworks with volume size of 2 ˆ ˆ
2. The activation layers are employed to introduce nonlinearityinto the CNN. It takes a single number and performs a certain fixed mathematical function. Sometypical activation functions are Rectified Linear Unit (ReLU) f p x q “ max p , x q , Sigmoid function f p x q “ {p ` e ´ x q and tanh function f p x q “ tanh p x q . Among these non-linear functions, ReLU(see Fig. 6) is preferred and thus selected owing to its cheap arithmetic operation and excellentconvergence properties on the stochastic gradient descent (SGD) algorithm compared with theSigmoid or tanh functions. Mathematical expression of the output value γ at position p x, y, z q on j th feature map in i th 3D convolutional layer can be written as [52] γ p i q j,xyz “ ReLU ¨˝ b p i q j ` M p i ´ q ÿ m “ P p i q ´ ÿ p “ Q p i q ´ ÿ q “ R p i q ´ ÿ r “ w p i q jm,pqr γ p i ´ q m, p x ` p qp y ` q qp z ` r q ˛‚ (7)where ReLU p¨q denotes element-wise ReLU function; b p i q j is the common bias for j th feature map; w p i q jm,pqr is the p p, q, r q th value of the 3D filter for j th feature map at i th layer associated with the m th feature map in the p i ´ q th layer; M p i ´ q is the number of feature maps at p i ´ q th layer; P p i q , Q p i q and R p i q denotes the size of the 3D filter at i th layer. In this paper, an constant filtersize is used through the convolutional layers.FC layers are employed at the end of the 3D-CNN where neurons between two neighboring8 R e L U ( x ) ReLU(x) = max(0, x)
Figure 6: Rectified linear unit (ReLU) function as the activation function. layers are interconnected. FC layers take the flattened tensor from the previous hidden layer as theinput and map them to desired output which are exactly the vector of effective material propertieswith length of 12 as shown in Eq. (2). The connection between two adjacent layers, here from p i ´ q th to i th, can be expressed concisely in the form of tensor operations, given by γ p i q “ σ ´ W p i q γ p i ´ q ` b p i q ¯ (8)where γ p i ´ q and γ p i q are the input and output for the i th layer; σ p¨q denotes the Sigmoid activationfunction acting element-wise; W p i q and b p i q are the weight matrix and bias vector between the i thand the p i ´ q th FC layers. The weights and biases in the FC layers are also the trainableparameters of the 3D-CNN. The mean square error (MSE) between the 3D-CNN’s prediction andthe ground truth of the training dataset is adopted as the loss function, given by L p W , b | D q “ n n ÿ k “ ÿ l “ ´ y truthkl ´ y predkl ¯ (9)where D denotes the training data set { x k , y k } , n denotes the total number of samples, l denotesthe index of component for the effective properties vector. The optimal parameters t W ˚ , b ˚ u canbe obtained by minimizing the loss function, namely, t W ˚ , b ˚ u “ argmin t W , b u t L p W , b | D qu (10)A common issue facing the DNN-based approaches is to mitigate the overfitting brought aboutby its extraordinary approximation ability. Several treatments are considered in this paper. Firstly,it is noted that there is a scale difference between the outputs of elastic (or shear) modulus andPoisson’s ratio which might bring problems to the optimization. For example, an output variablewith a large range of values could result in large error gradient values causing weight values tochange dramatically, making the learning process unstable [65]. Therefore, label rescaling is em-ployed here to address this problem. The elastic (or shear) moduli and Poisson’s ratios are scaled9eparately into the range of 0 to 1 with a min-max scaling manner, e.g.,¯ y “ y ´ min p y q max p y q ´ min p y q (11)where y denotes the output component vector while ¯ y is the corresponding scaled output. Inaddition to label rescaling, early stopping [66] and sample shuffling during training are adopted asthe regularizer to alleviate overfitting.In this paper, the filter size of 5 ˆ ˆ
5, stride length of 1 and no-padding are configured on theconvolutional layers. The max pooling with size 2 ˆ ˆ
3. Results
In this section, the performance of the proposed 3D-CNN for heterogeneous material homog-enization is evaluated. A series of parametric tests on the network hyperparameters (e.g., filtersize, depth, width) of the 3D-CNN are conducted to find a suitable architecture for the currentapplication. Then the trained 3D-CNN is used to predict the effective properties on the testingdataset with 300 RVEs. The performance of the 3D-CNN is discussed based on a comparisonbetween the model inference and the results produced by traditional FEA. Since the randomnessof the inclusion distribution is a significant aspect of the naturally occurring heterogeneous ma-terials, uncertainty quantification is conducted on an independent dataset that imitates the inputwith uncertainty. Finally, the transferability of the trained 3D-CNN model to a new dataset (forRVEs with different inclusion shapes) is examined. The proposed 3D-CNN architecture is imple-mented with the high-level neural networks API - Keras [68] using Python 3.7. Our networks aretrained on platform equipped with NVIDIA GeForce GTX 1080 Ti GPU and Intel Core [email protected].
A typical CNN involves dozens of hyperparameters that control the learning process of thenetwork. These include the number of filters, filter size, learning rate, number of hidden layers,and batch size, just to name a few. The huge sample space makes it nearly impossible to find anoptimal combination of hyperparameters. Therefore, the hyperparameters are usually searched in atrail-and-error manner within a small sample space. Fortunately, some rules of thumb for selectingthe hyperparameters can be applied here. For example, the number of filters in convolutionallayer should reflect the enrichment of characterized features within the input. It usually dependson the number of samples and the complexity of the task [69]. The number of FC layers andneurons determine directly the total number of parameters (weights and biases) and thus affectthe representational power of the network [70]. Therefore, it is natural to select the hyperparametercombination based on the underlying physical and mathematical interpretation of the “knowledge”to be learned. In this section, We evaluate different 3D-CNN architectures with varying numberof hidden layers and filters. The MSE on validation dataset) defined in Eq. (9) is used to measurethe performance of each 3D-CNN architecture. 10 able 2: Performance comparison of various 3D-CNN architecture.
No. Model description MSE1 Conv(16,5)+Conv(16,5)+Conv(32,5)+FC(32 ˆ
16) 2.82 ˆ ´ ˆ
32) 2.79 ˆ ´ ˆ
64) 2.89 ˆ ´ ˆ
32) 6.33 ˆ ´ ˆ
32) 3.61 ˆ ´ ˆ
32) 3.44 ˆ ´ ˆ ˆ
32) 2.86 ˆ ´ As is mentioned in Section 2.1, the Cartesian grid used to sample the RVE is of size 101 ˆ ˆ
101 so that the smallest inclusion with radius equaling 0.05 mm could be captured. In our designof the 3D-CNN architecture, we select the fixed filter size to be 5 in all three dimensions so that itis identical to the size of smallest inclusion. The batch size during training is set to be 25 accordingto the memory space available on the hardware. The trained model with the best performance,i.e., lowest MSE, for each architecture after 1000 epochs are saved for later inference. This is thecommonly used technique aforementioned as early stopping. Table 2 provides the configurationsof each 3D-CNN architecture. The convolutional layer and fully connected layer are denoted byConv( ¨ ) and FC( ¨ ) respectively. The values within the bracket of Conv( ¨ ) indicates the filter numberand filter size. Similarly the values within the bracket of FC( ¨ ) represent the number of neurons(width) in each layer. For example, Conv(32, 5) means the convolutional layer has 32 filters whosesize is 5 ˆ ˆ In this part, the performance of the trained 3D-CNN model is evaluated on the validationdataset which consists of 300 RVEs with the same VF range (e.g., 2%-28%). The prediction and11 a) Input (b) Conv-1 (c) Conv-2 (d) Conv-3
Figure 7: Visualization of the input slice and the feature map slice.
80 100 12080100120 E
80 100 12080100120 E
80 100 12080100120 E
30 40 503035404550 G
30 40 503035404550 G
30 40 503035404550 G Prediction (3D-CNN) G r o un d t r u t h ( F E A ) Figure 8: Comparison between the 3D-CNN prediction and ground truth (FEA). ground truth (obtained through FEA) for the effective properties of each RVE sample is shown asscatter plots in Fig. 8. Since the baseline is given as red line, we can see that the trained modelgives accurate prediction for the 12 components of Young’s modulus, shear modulus and Poisson’sratio. It is also observed that the prediction on the samples with low VFs, e.g., the left-bottom partof the scatter plots for moduli ( E ’s and G ’s) and the upper-right part for Poisson’s ratios ( ν ’s),perform identically well as the counterpart with high VFs, though larger randomness is presentfor RVEs with low VFs. Let us recall that, in Section 2.1, an exponential distribution of samplenumber against VF is imposed while generating the datasets. As a result, the number of low VFsamples is much greater than the number of the high VF samples, which alleviates the issue of lowVF induced uncertainty. To measure the prediction performance quantitatively, we calculate themean absolute relative error (MARE) for each component, defined asMARE “ n n ÿ i “ | ˆy i ´ y i || y i | (12)where ˆy i and y i are prediction and ground truth of the component for the i th test sample. Theresults are summarized in Table 3. It is seen that the MAREs, for all the 12 components, are below0.55%.The efficiency of the proposed 3D-CNN approach is also evaluated by drawing a contrast of12 C o n s u m e d t i m e ( s e c / R V E ) FEA (CPU)3D-CNN (CPU)3D-CNN (GPU) 0.5033 sec0.0198 sec
Figure 9: Comparison of the computational time per RVE for 3D-CNN prediction and FEA.Table 3: MARE on the testing data set. E E E G G G ν ν ν ν ν ν MARE (%) 0.45 0.42 0.47 0.48 0.50 0.53 0.22 0.23 0.24 0.22 0.25 0.22 computational time between 3D-CNN inference and finite element analysis (FEA), as shown in Fig.9. Note that the process of inference is defined as the prediction operation on new input data bythe trained 3D-CNN model. It is well known that GPU parallelization has been highly exploitedon deep learning models in the context of both network training and inference. However, to makethe comparison fair, we also collect the averaged CPU time consumed by 3D-CNN by performinginference on the CPU. The configurations of hardware are given in the beginning of Section 3. It isnoted that the CPU time of FEA depends largely on the number of discrete elements of the RVE.In our test, the number of tetrahedral elements in the discretized RVEs increases from 7705 forVF=2.13% to 26136 for VF=28.22% to maintain a reliable discretization. We collect the averagedcomputational time of 10 different RVEs for each fixed VF. For the 3D-CNN inference, however,the computational time is theoretically independent of VF since all the RVEs are sampled with101 ˆ ˆ
101 voxels. We collect the computational time for 300 RVEs with all VF covered. Itis seen from Fig. 9 that the GPU-based 3D-CNN inference provides 25 ˆ speedup for the low-VFsamples and up to 50 ˆ speedup for the highest VF. Even on the CPU, the 3D-CNN beats thetraditional FEA for VF greater than 12%.Another aspect that cannot be neglected is the computational time for training the 3D-CNNmodel. For the training dataset with 1400 RVEs considered in this paper, it takes about 35 hours onGPU to achieve a desirable trained model. Nevertheless, it is noticed that the high computationaldemand for training is one-off which means that, once the model is trained, the inference can beconducted on any upcoming new RVEs that fall into the ensemble. Even if the new RVE comesfrom another type of composite, the transferability of the trained 3D-CNN, discussed in Section3.4, will largely reduce the time expense. We will verify that transfer learning makes the 3D-CNN extremely convenient for adding supplementary data or training a model for new datasets toaccount for new scenarios and enhance the generalizability of the trained model.13 a) (b) (c)(d) (e) (f)(g) (h) (i)(j) (k) (l) Figure 10: Distribution of effective properties of the 3D-CNN prediction and FEA result for the dataset of VF=7%:(a) E (b) E (c) E (d) G (e) G (f) G (g) ν (h) ν (i) ν (j) ν (k) ν (l) ν .Table 4: VF parameters used for uncertainty quantification. Mean ( µ , %) SD ( σ , %) Number of RVE Samples7 0.7 20014 0.7 20021 0.7 200 Modelling of natural composites is usually characterized with uncertainty. The uncertaintymay come from the measurement error, microstructural randomness, mixture of materials and someother natural (or artificial) systems. Predicting the effective properties in a probabilistic/statisticalsense, such as obtaining the mean value and standard deviation (SD), would provide a betterreference for engineering and designing materials.Strictly speaking, the output of a trained 3D-CNN is deterministic for a given input. Therefore,the uncertainty of the 3D-CNN output is largely affected by the variance of the input. To verifythat our 3D-CNN model is capable of preserving the uncertainty of the effective properties forthe particle reinforced composite, we manually introduce the uncertainty into the dataset to beevaluated in the framework of Monte Carlo simulation. In particular, we generate a group ofRVE samples VF following Gaussian distributions (e.g., mean of 7%, 14% and 21% for threeconfigurations, and identical standard deviation of 0.7%). In each configuration, 200 RVEs aregenerated. The details for the uncertainty quantification (UQ) dataset are listed in Table 4.Fig. 10 presents the predicted distributions of the modulus and Poisson’s ratio components incomparison with the reference ground truth. These histograms are fitted by Gaussian distributionswhose mean and standard deviation parameters are also listed. It can be seen that the trained14 a) (b) (c)(d) (e) (f)(g) (h) (i)(j) (k) (l)
Figure 11: Distribution of predicted effective properties for RVEs with mean VF of 7%, 14% and 28%: (a) E (b) E (c) E (d) G (e) G (f) G (g) ν (h) ν (i) ν (j) ν (k) ν (l) ν . A major assumption required by lots of DL approaches is that the training data and futuredata must be from the same generator or source. In other words, they must be in the same featurespace and follow the same distribution [73]. In many real-world applications, this assumptionmay not hold. In these cases, if the knowledge learned by the DL model can be transferred, itwill largely reduce the effort on retraining the model on new datasets. The transferability refersto the convenience of transferring the learned knowledge from a trained model to a different butrelated problem. Transfer learning is usually achieved through transfer the pre-trained model to anew model with additional trainable parameters relying on new datasets of interest (e.g., addingadditional layers to the trained network while fixing the transferred network parameters from theoriginal model). The need for transferring learning arises when the acquired data can be easilyoutdated or when the target data is intractable (or costly) to obtain but a less rich dataset isavailable. 15 a) VF = 6.26% (b) VF = 14.67% (c) VF = 20.79%Figure 12: Sampled phase voxel for RVEs with ellipsoidal inclusions
Epoch l o g ( l o ss ) Training from scratchTransfer learning
Figure 13: Comparison of the learning curve
To examine the transferability of the previously trained 3D-CNN model, we consider a newdataset of RVEs with ellipsoidal inclusions. The major and minor radius of the ellipsoids arerandomly generated within the interval r . , . s independently. The overall range of the VF isthe same as the previous data set (e.g., 2%-28%). Following the similar manner in Section 2.1, amuch smaller dataset with only 320 samples is generated with the sample number as an exponentialfunction of the VF. The entire data is divided into training, validation and testing set with theratio of 200:60:60. We transfer the trained 3D-CNN model with the architecture described in Case2 as shown in Table 2, and establish a new 3D-CNN network by adding one additional convolutionlayer before flattening, e.g., Conv(32, 5), and activate the trainable parameters in the last FC layer(see Fig. 3). We try to generalize the trained 3D-CNN for RVEs with spherical inclusions to thecase of ellipsoidal inclusions (see Fig. 12 for example). The transfer learning (TL) model finetuned with new dataset is compared with the model trained from scratch (TS) with regard to thelearning curve and prediction performance. 16 E TSTL 80 100 12080100120 E
80 100 12080100120 E
30 4030354045 G
30 4030354045 G
30 4030354045 G Prediction G r o un d t r u t h Figure 14: Comparison of the prediction performance for trained from scratch (TS) model and transfer learning (TL)model.
The learning curves for both cases (e.g., TL vs.
TS) are shown in Fig. 13 where the x -axisdenotes the epoch and y -axis denotes the loss function value. It can be seen that the initial loss ismuch lower for the TL model which indicates that the transferred model for sphere inclusions canalready well capture the latent features for RVEs with ellipsoidal inclusions. The asymptote for theTS convergence curve is much higher than that of the TL model. Given a small amount of trainingdataset, the TL model converges much faster, e.g., only taking dozens of epochs for the loss todecrease to 3 . ˆ ´ which is close to our best model (2 . ˆ ´ ) discussed in Section 3.1. Itdemonstrates that we can successfully transfer the knowledge of as well as fine tune a pre-trained3D-CNN model to achieve a good accuracy at a particular low training expense. Therefore, thetransfer learning might help overcome problems such as lack of the data and high computationalcost for training a large size model. These challenges are critical especially in field measurementswhere rich RVEs data are costly to obtain. The prediction performance of these both TL and TSmodels are compared in Fig. 14. It is evident that the TL model outperforms the TS model nomatter in the bias or variance of the effective properties. The averaged MARE for the TL and TSmodels on all the components are 0.43% and 1.36%, respectively.
4. Conclusions
In this paper, a 3D-CNN approach is proposed for determining the effective/homogenized prop-erties of heterogeneous materials. In particular, we consider RVEs reinforced by reandomly dis-tributed particle inclusion (e.g., spherical and elliptical inclusions). The geometries of the RVEsare generated using the Hierarchical Random Sequential Adsorption (HRSA) algorithm [55] andlabeled for training the 3D-CNN model via FEA-based linear homogenization. The proposed 3D-CNN architecture consists of multiple hidden 3D convolution layers, pooling operation, flatteningand FC layers. A parametric study of the network hyperparameters has been conducted to deter-mine optimal network architecture with the best inference performance. The proposed approachwas tested on a series of numerical experiments in the context of inference accuracy, computationalefficiency, uncertainty quantification (UQ) ability and transferability. Results show promising po-tential of the proposed approach to advance efficient design and analysis of heterogeneous compositematerials composed of representative microstructures.17t is worth mentioning that the comparison with the FEA results shows that the 3D-CNNmodel can reproduce the effective material properties with a high accuracy (e.g., the maximumprediction error around 0.5%). Also, the 3D-CNN demonstrates advantages regarding the compu-tational efficiency for the model inference over the traditional FEA, which could achieve a speed-upfrom 25 ˆ to 50 ˆ on GPU operation. In addition, the UQ study verifies the trained 3D-CNN iscapable of accurately predicting probabilistic distributions of the effective material properties, inthe framework of Monte Carlo simulation, when uncertain inputs are provided.In summary, the proposed 3D-CNN is characterized with the following benefits: (1) It providesan end-to-end solution for predicting the effective material properties from 3D phase voxels whichcan be obtained via parametric modeling, advanced imaging techniques such as X-ray micro-topography and 3D atom probe; (2) It is able to reproduce the effective properties with a highaccuracy and computational efficiency, which would empower a faster product design iteration ordesign optimization for composite materials; (3) The 3D-CNN model preserves the probabilisticdistribution of effective material properties for the input with uncertainty. This feature makes the3D-CNN a promising approach for probabilistic engineering design; (4) The knowledge learned bythe 3D-CNN model can be easily transferred to a different type of composite at a very low trainingexpense, in which a good prediction performance can still be achieved even on a new dataset ofsmall size with the help of transfer learning. This particular characteristic becomes significantwhen RVEs data are costly to obtain.Nevertheless, there remain some issues of interest on the 3D-CNN model to be studied inthe future, that include, for example: (1) investigating the universality of transfer learning onother heterogeneous materials such as fiber-reinforced or polymer composites; (2) extending thecurrent 3D-CNN to model composites with nonlinear material properties (to this end, the loadcondition on each RVE must be considered as part of the input for the networks); (3) applying thetrained model or retraining a generative model for microstructure generation with desired effectiveproperties [43, 48]. Acknowledgement
The authors would like to thank Dr. Hao Sun and Dr. Ruiyang Zhang, from the Department ofCivil and Environmental Engineering at Northeastern University, for their constructive suggestionsand comments on designing the proposed network.
Data Availability
The datasets and computer codes are available upon request from the authors.
ReferencesReferences [1] U. Hornung, Homogenization and porous media, Vol. 6, Springer Science & Business Media, 2012.[2] J. Aboudi, S. M. Arnold, B. A. Bednarcyk, Micromechanics of composite materials: a generalized multiscaleanalysis approach, Butterworth-Heinemann, 2012.[3] W. Voigt, Ueber die beziehung zwischen den beiden elasticit¨atsconstanten isotroper k¨orper, Annalen der Physik274 (12) (1889) 573–587.
4] A. Reuß, Berechnung der fließgrenze von mischkristallen auf grund der plastizit¨atsbedingung f¨ur einkristalle.,ZAMM-Journal of Applied Mathematics and Mechanics/Zeitschrift f¨ur Angewandte Mathematik und Mechanik9 (1) (1929) 49–58.[5] J. Aboudi, The generalized method of cells and high-fidelity generalized method of cells micromechanical modelsareview, Mechanics of Advanced Materials and Structures 11 (4-5) (2004) 329–366.[6] Z. Hashin, Assessment of the self consistent scheme approximation: conductivity of particulate composites,Journal of Composite Materials 2 (3) (1968) 284–300.[7] M. Berveiller, A. Zaoui, An extension of the self-consistent scheme to plastically-flowing polycrystals, Journalof the Mechanics and Physics of Solids 26 (5-6) (1978) 325–344.[8] T. Mori, K. Tanaka, Average stress in matrix and average elastic energy of materials with misfitting inclusions,Acta Metallurgica 21 (5) (1973) 571–574.[9] F. Feyel, Multiscale FE elastoviscoplastic analysis of composite structures, Computational Materials Science16 (1-4) (1999) 344–354.[10] F. Feyel, J.-L. Chaboche, FE multiscale approach for modelling the elastoviscoplastic behaviour of long fibreSiC/Ti composite materials, Computer Methods in Applied Mechanics and Engineering 183 (3-4) (2000) 309–330.[11] F. Feyel, A multilevel finite element method (FE ) to describe the response of highly non-linear structures usinggeneralized continua, Computer Methods in Applied Mechanics and Engineering 192 (28-30) (2003) 3233–3244.[12] C. Miehe, J. Schotte, M. Lambrecht, Homogenization of inelastic solid materials at finite strains based onincremental minimization principles. application to the texture analysis of polycrystals, Journal of the Mechanicsand Physics of Solids 50 (10) (2002) 2123–2167.[13] R. Smit, W. Brekelmans, H. Meijer, Prediction of the mechanical behavior of nonlinear heterogeneous systemsby multi-level finite element modeling, Computer Methods in Applied Mechanics and Engineering 155 (1-2)(1998) 181–192.[14] K. Terada, N. Kikuchi, A class of general algorithms for multi-scale analyses of heterogeneous media, ComputerMethods in Applied Mechanics and Engineering 190 (40-41) (2001) 5427–5464.[15] M. Kami´nski, Boundary element method homogenization of the periodic linear elastic fiber composites, Engi-neering Analysis with Boundary Elements 23 (10) (1999) 815–823.[16] H. Okada, Y. Fukui, N. Kumazawa, Homogenization method for heterogeneous material based on boundaryelement method, Computers & Structures 79 (20-21) (2001) 1987–2007.[17] S.-B. Lee, R. Lebensohn, A. D. Rollett, Modeling the viscoplastic micromechanical response of two-phase ma-terials using fast fourier transforms, International Journal of Plasticity 27 (5) (2011) 707–727.[18] P. Eisenlohr, M. Diehl, R. A. Lebensohn, F. Roters, A spectral method solution to crystal elasto-viscoplasticityat finite strains, International Journal of Plasticity 46 (2013) 37–53.[19] P. Kanout´e, D. Boso, J. Chaboche, B. Schrefler, Multiscale methods for composites: a review, Archives ofComputational Methods in Engineering 16 (1) (2009) 31–75.[20] Z. Yuan, J. Fish, Toward realization of computational homogenization in practice, International Journal forNumerical Methods in Engineering 73 (3) (2008) 361–380.[21] M. Hain, P. Wriggers, Numerical homogenization of hardened cement paste, Computational Mechanics 42 (2)(2008) 197–212.[22] Y. Liu, V. Filonova, N. Hu, Z. Yuan, J. Fish, Z. Yuan, T. Belytschko, A regularized phenomenological multiscaledamage model, International Journal for Numerical Methods in Engineering 99 (12) (2014) 867–887.[23] Y. Liu, W. Sun, Z. Yuan, J. Fish, A nonlocal multiscale discrete-continuum model for predicting mechanicalbehavior of granular materials, International Journal for Numerical Methods in Engineering 106 (2) (2016)129–160.[24] F. Fritzen, O. Kunc, Two-stage data-driven homogenization for nonlinear solids using a reduced order model,European Journal of Mechanics-A/Solids 69 (2018) 201–220.[25] G. B. Olson, Computational design of hierarchically structured materials, Science 277 (5330) (1997) 1237–1242.[26] T. Lookman, P. V. Balachandran, D. Xue, R. Yuan, Active learning in materials science with emphasis onadaptive sampling using uncertainties for targeted design, npj Computational Materials 5 (1) (2019) 21.[27] D. Fujii, B. Chen, N. Kikuchi, Composite material design of two-dimensional structures using the homogenizationdesign method, International Journal for Numerical Methods in Engineering 50 (9) (2001) 2031–2051.[28] G. Landi, S. R. Niezgoda, S. R. Kalidindi, Multi-scale modeling of elastic response of three-dimensional voxel-based microstructure datasets using novel dft-based knowledge systems, Acta Materialia 58 (7) (2010) 2716–2725.[29] T. Fast, S. R. Niezgoda, S. R. Kalidindi, A new framework for computationally efficient structure–structureevolution linkages to facilitate high-fidelity scale bridging in multi-scale materials models, Acta Materialia 59 (2)
55] M. Bailakanavar, Y. Liu, J. Fish, Y. Zheng, Automated modeling of random inclusion composites, Engineeringwith Computers 30 (4) (2014) 609–625.[56] J. Guedes, N. Kikuchi, Preprocessing and postprocessing for materials based on the homogenization methodwith adaptive finite element methods, Computer Methods in Applied Mechanics and Engineering 83 (2) (1990)143–198.[57] B. D. Ripley, N. Hjort, Pattern recognition and neural networks, Cambridge University Press, 1996.[58] A. Stienon, A. Fazekas, J.-Y. Buffiere, A. Vincent, P. Daguier, F. Merchi, A new methodology based on X-raymicro-tomography to estimate stress concentrations around inclusions in high strength steels, Materials Scienceand Engineering: A 513 (2009) 376–383.[59] H. Proudhon, J.-Y. Buffi`ere, S. Fouvry, Three-dimensional study of a fretting crack using synchrotron X-raymicro-tomography, Engineering Fracture Mechanics 74 (5) (2007) 782–793.[60] A. Karako, J. Paltakari, E. Taciroglu, Data-driven computational homogenization method based on euclideanbipartite matching, Journal of Engineering Mechanics 146 (2) (2020) 04019132.[61] T. F. Kelly, M. K. Miller, Atom probe tomography, Review of Scientific Instruments 78 (3) (2007) 031101.[62] J. E. Spowart, Automated serial sectioning for 3-D analysis of microstructures, Scripta Materialia 55 (1) (2006)5–10.[63] O. Betz, U. Wegst, D. Weide, M. Heethoff, L. Helfen, W.-K. Lee, P. Cloetens, Imaging applications of syn-chrotron X-ray phase-contrast microtomography in biological morphology and biomaterials science. I. generalaspects of the technique and its advantages in the analysis of millimetre-sized arthropod structure, Journal ofMicroscopy 227 (1) (2007) 51–71.[64] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel, Backpropagationapplied to handwritten zip code recognition, Neural Computation 1 (4) (1989) 541–551.[65] C. M. Bishop, et al., Neural networks for pattern recognition, Oxford university press, 1995.[66] F. Girosi, M. Jones, T. Poggio, Regularization theory and neural networks architectures, Neural Computation7 (2) (1995) 219–269.[67] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980.[68] F. Chollet, et al., Keras, https://github.com/fchollet/keras (2015).[69] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, in:Advances in Neural Information Processing Systems, 2012, pp. 1097–1105.[70] G. Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics of Control, Signals andSystems 2 (4) (1989) 303–314.[71] X. Du, W. Chen, Efficient uncertainty analysis methods for multidisciplinary robust design, AIAA journal 40 (3)(2002) 545–552.[72] C. Chen, D. Duhamel, C. Soize, Probabilistic approach for model and data uncertainties and its experimentalidentification in structural dynamics: Case of composite sandwich panels, Journal of Sound and Vibration294 (1-2) (2006) 64–81.[73] S. J. Pan, Q. Yang, A survey on transfer learning, IEEE Transactions on Knowledge and Data Engineering22 (10) (2009) 1345–1359.(2015).[69] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, in:Advances in Neural Information Processing Systems, 2012, pp. 1097–1105.[70] G. Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics of Control, Signals andSystems 2 (4) (1989) 303–314.[71] X. Du, W. Chen, Efficient uncertainty analysis methods for multidisciplinary robust design, AIAA journal 40 (3)(2002) 545–552.[72] C. Chen, D. Duhamel, C. Soize, Probabilistic approach for model and data uncertainties and its experimentalidentification in structural dynamics: Case of composite sandwich panels, Journal of Sound and Vibration294 (1-2) (2006) 64–81.[73] S. J. Pan, Q. Yang, A survey on transfer learning, IEEE Transactions on Knowledge and Data Engineering22 (10) (2009) 1345–1359.