[PDF] Digital representation and quantification of discrete dislocation networks

Abstract

Dislocation networks and their evolution are known to control the mechanical properties of metal samples. However, the lack of computationally efficient and statistically rigorous descriptors for such defect systems has hindered the development and adoption of rational protocols for the optimal design of these material systems. This study presents a framework for the rigorous statistical quantification and low dimensional representation of dislocation networks using the formalism of 2-point spatial correlations (also called 2-point statistics) along with Principle Component Analysis (PCA). The usefulness of this basic framework for comparing and observing dislocation networks is exemplified and discussed with suitable examples.

Full PDF

DDigital representation and quantiﬁcation of discrete dislocation networks

Andreas E. Robertson a and Surya R. Kalidindi ∗ †a,ba George W. Woodruﬀ School of Mechanical Engineering, Georgia Institute ofTechnology, Atlanta, GA 30332 USA b School of Computational Science and Engineering, Georgia Institute ofTechnology, Atlanta, GA 30332 USA

January 12, 2021

Abstract

Dislocation networks and their evolution are known to control the mechanical prop-erties of metal samples. However, the lack of computationally eﬃcient and statisticallyrigorous descriptors for such defect systems has hindered the development and adop-tion of rational protocols for the optimal design of these material systems. This studypresents a framework for the rigorous statistical quantiﬁcation and low dimensional rep-resentation of dislocation networks using the formalism of 2-point spatial correlations(also called 2-point statistics) along with Principle Component Analysis (PCA). Theusefulness of this basic framework for comparing and observing dislocation networks isexempliﬁed and discussed with suitable examples. ∗ Email: [email protected] † Corresponding Author a r X i v : . [ c ond - m a t . d i s - nn ] J a n Introduction

Materials exhibit rich features over a hierarchy of length scales, spanning from the atomisticto the macroscale [1, 2, 3], which collectively control their bulk properties. For example, amaterial’s bulk plastic properties are strongly inﬂuenced by mesoscale features (e.g., crys-tallographic texture, grain size and shape distributions) [3, 1, 4, 5] as well as crystal defectsat the lower length scales (e.g., dislocations) [1, 6, 7, 8]. In order to attain the improvedcombinations of properties demanded by advanced technologies, it is necessary to engineerthese desired salient features in the context of the multiscale material structure. Rigorousapproaches to address these materials design challenges are critically dependent on accessto high-ﬁdelity materials knowledge expressed as salient process-structure-property (PSP)linkages. This materials knowledge base is traditionally accessed via computationally expen-sive physics-based simulations (e.g., crystal plasticity ﬁnite element models [3], phase ﬁeldsimulations [9, 10]) or by experimental observations. Because successful materials designeﬀorts typically require repeated access to the underlying knowledge base, the high cost ofthese methods makes it impractical to deploy them directly in current design protocols. Atthe mesoscale, it has been shown that the challenges described above can be addressed usingthe data-centric Materials Knowledge System (MKS) framework [11, 12]. This frameworkprovides systematic workﬂows to mine desired high-value materials knowledge, contained inall available experimental and simulation data [13], into low-computational cost surrogatemodels. Recent advances in physics-based simulation tools (e.g., three-dimensional (3D)Discrete Dislocation Dynamics (DDD) [6, 7]) and characterization techniques (e.g., Trans-mission Electron Microscopy (TEM) [14, 15, 16]) for studying dislocation networks havedemonstrated the capacity for generating and aggregating similarly large datasets. However,comparable frameworks to MKS to address the material design needs at the dislocationlength scales do not yet exist. As a result, new research avenues have opened for the devel-opment and application of suitable data science approaches for studying and extracting highvalue knowledge from dislocation datasets.Quantitative microstructural statistics form the basis for materials data science methods[12, 11, 17]. As a result, the identiﬁcation and adoption of salient statistics is an essentialﬁrst step in the rigorous study of dislocation networks. Such descriptors should be eﬃcient tocalculate, easy to visualize and compare, and capture the spatial arrangement of importantdislocation features [12]. This ﬁnal requirement extends from the close relationship betweenthe spatial arrangement of dislocations and their bulk behavior and/or properties [18, 19]. Inthe general study of dislocations, several potential candidate statistics are commonly used.2he bulk dislocation density is most popular because it is easy to calculate and compare.Additionally, it can be used to predict bulk properties, such as the Critical Resolved ShearStrength (CRSS), with modest success [20]. Other, less commonly used measures, includefractal analysis [21] and spatial correlations [22, 23, 24, 25, 26, 27]. Generally speaking, thebulk dislocation density is insuﬃciently descriptive because it does not describe the spatialarrangement of the network. On the other hand, fractal statistics are overly specialized:they are mainly meant to represent cell forming dislocation networks.In contrast, several studies have illustrated the promise of spatial correlations in de-scribing eﬀectively a network’s salient attributes [22, 23, 24, 25]. However, these previousinvestigations also highlighted some of the current deﬁciencies. Primarily, the complex func-tional form and high dimensional nature of the correlations largely precludes meaningfulvisualization and analysis without signiﬁcant simpliﬁcations (e.g., Deng and El-Azab [22]).They remain prohibitively high dimensional for tasks such as classiﬁcation of large data-setsand extraction of high ﬁdelity PSP linkages. Secondly, these earlier eﬀorts have not yet pro-posed an eﬃcient computational framework for extracting network statistics from discretedislocation networks (such as those generated in DDD simulations).The primary goal of this paper is to propose and demonstrate a computational frame-work for the extraction of low-dimensional salient spatial statistics from discrete dislocationnetworks. The central tasks in this development include the establishment of a generalizedmathematical framework for the spatial correlations of dislocation networks, the constructionof a systematic and eﬃcient scheme for their computation, and the extraction of salient, lowdimensional statistics from the high dimensional spatial correlations. This paper presents aframework addressing these gaps, and demonstrates its merits through simple examples.

N-point spatial correlations (also referred to as n-point statistics) [11, 17, 12, 28, 13] oﬀer themost versatile and comprehensive framework for the rigorous statistical quantiﬁcation of thecomplex hierarchical internal structures encountered in most advanced materials. Indeed,these concepts have been employed successfully to study material structures at the mesoscalein a broad of range of applications from classiﬁcation of microstructures [29, 30, 31] to theextraction of high-ﬁdelity low-computational cost PSP linkages [10, 32, 33, 34] calibratedto physics-based simulations or experimental observations. They are also beginning to be3mployed on atomistic datasets [35].In the mesoscale theories, n-point statistics were ﬁrst encountered in Kroner’s statisticaltreatment of the eﬀective stiﬀness tensor of a two-phase composite microstructure [36]. Thistreatment proposed a quantitative relationship between the n-point statistics of the local elas-tic stiﬀness and the eﬀective elastic stiﬀness. This approach was subsequently generalized toa broad range of the microstructure’s eﬀective properties [11, 17]. These statistical theoriesformulate this connection by interpreting the speciﬁc spatial arrangement of microstructuralfeatures as a sampled instance from a stationary, spatially resolved, ergodic random process[37, 22, 38, 11]. Within this statistical interpretation, a microstructure with multiple localstates (such as a two-phase composite) is generated from multiple co-dependent randomprocesses. A natural conclusion of this interpretation is that the eﬀective properties are de-pendent on the random process, not the observed microstructure instantiation. Because sucha random process is completely characterized by its n-point statistics, the desired connectionis established naturally [39, 38].In practice, a microstructure’s set of 2-point statistics are often a suﬃcient descriptor[10, 34, 37, 27]. For a simple illustration of their deﬁnition, let us consider a spatiallyresolved H -phase microstructure random process, M h ( x ) . Here, h serves as an integer indexfrom 1 to H , indexing the random process for each discrete material local state. For twoarbitrary states, β and γ , the corresponding 2-point statistics are deﬁned by M f βγ ( τ ) = E [ M β ( x + τ ) M γ ( x )] (1)Because the processes are assumed to be stationary, the 2-point statistics are spatially re-solved functions of the diﬀerence vector between two points, τ . As a result, the 2-pointstatistics are deﬁned by the expectation of the processes at points spatially separated by τ . Because microstructure random processes are assumed to be ergodic, the expectation inEq. (1) is equal to the spatial average of the product (cid:10) m β ( x + τ ) m γ ( x ) (cid:11) over a sampledinstance, m h ( x ) . Mathematically, this is expressed as [39] M f βγ ( τ ) = (cid:10) m β ( x + τ ) m γ ( x ) (cid:11) = 1 X (cid:90) X m β ( x + τ ) m γ ( x ) d x (2)At the mesoscale, Kalidindi et al. refer to m h ( x ) as the microstructure function [38, 11].It mathematically represents the sampled instance from the microstructure random process, M h ( x ) . In Eq. (2), X denotes the volume of the representative volume element (RVE).In order to simplify the computation deﬁned in Eq. (2), Kalidindi et al. deﬁne a dis-4rete microstructure function, m hs , by voxelizing the spatial domain [11, 12]. The discretemicrostructure function takes an average value for each voxel, indexed by s . The correspond-ing discrete 2-point statistics are deﬁned by M f βγt = 1 S S (cid:88) s =1 m βs + t m γs (3)In this discrete form, the functional dependence on τ is replaced by a discrete index ofthe voxel separation, t [11]. Instead of averaging over the domain volume, X , the discretespatial statistics are averaged by the total number of feasible samples, S . For periodicmicrostructures, S is the same as the number of voxels. These concepts can be extendedto higher-order spatial correlations (i.e., beyond 2-point statistics in the framework of n-point statistics). Because of their expected causal relationship with the microstructure’seﬀective properties, n-point statistics serve as natural structural descriptors within the PSPframework [11].The considerations described above are also applicable at the scale of discrete disloca-tions. With a suitable selection of local state descriptors, the spatial arrangement within adislocation network can also be treated as a random process, and can be rigorously quantiﬁedusing the formalism of n-point statistics [25, 37, 22, 26, 27]. Furthermore, these measurescan be quantitatively correlated with any of the eﬀective (homogenized) properties deﬁnedfor dislocation networks [40, 41]. Previous studies have explored diﬀerent mathematical representations of the dislocationnetworks for capturing the spatial distributions of both individual dislocations and suitablygrouped ensembles of dislocations [26, 27, 22, 37, 42, 43, 44, 45, 46, 47]. These deﬁnitionstrade oﬀ complexity for accuracy in their representation. For example, the spatially resolveddislocation density, ρ ( x ) provides a simple and easy to understand description (the totaldislocation line length per unit volume, deﬁned in the limit of a small volume in the neigh-borhood of the spatial location, x ). A more accurate descriptor can be deﬁned by furthersubdividing the density onto each slip system at the spatial location, ρ k ( x ) . Here, k indexesthe slip-system.Recently, the following spatially resolved vector density representation has gained promi-5ence [46, 37, 48, 26, 43]: ρ k ( x ) = ρ kb ( x )ˆ b + ρ kt ( x )ˆ t + ρ kn ( x )ˆ n k = 0 , , , ..., K (4)For each slip system, the vector local state of the dislocation network is deﬁned using anorthobasis comprised of the screw (or burger’s) direction, ˆ b , the edge direction, ˆ t , and theout-of-plane direction, ˆ n . The corresponding components, ρ ki ( x ) , reﬂect the geometricallynecessary dislocation densities, GND, along each direction. The GND is deﬁned as the netdislocation length per unit volume [46]. It has been shown that this descriptor containssuﬃcient information to calculate stress ﬁelds induced by the dislocation network [48, 49, 7]and to formulate closed ﬁeld equations for the network’s evolution [48, 7]. Additionally, thespatial correlations of this vector density ﬁeld have been used to deﬁne the source termscontrolling the network’s evolution [41]. This representation does have some limitationsbecause it reﬂects only the net dislocation content at the spatial location x . As an example,considering a small volume in the neighborhood of x containing two dislocations on the sameslip system and having equivalent length but opposite vector direction (or "sense"), the GNDwould be zero. Clearly, the selection of the neighborhood size implied in this deﬁnition needscareful consideration. As discussed earlier, the growth of data-scientiﬁc techniques to study, compare, and extracthigh-ﬁdelity knowledge from discrete dislocation data is limited by a lack of eﬃcient toolsfor extracting quantitative statistical descriptors of the observed networks and methods foreasily representing them. We leverage several tools from data science, computer graphics,and, digital signal processing to overcome these obstacles.

For voxel-based descriptions of a dislocation network, it becomes necessary to calculate theline length contained within each voxel. The voxelization of straight lines is an area of exten-sive research in computer graphics. Siddon’s algorithm, used extensively in medical imaging,traces the voxels crossed by a 3D straight line and records the length traversed within eachvoxel [50]. Additionally, Siddon’s algorithm exhibits computational cost complexity O(N)making it eﬃcient for the voxelization of large datasets, such as those generated by DDD.6n this work, the original algorithm is also suitably modiﬁed to calculate vector line densitywithout signiﬁcantly increasing its computational cost.

Previous research by Kalidindi et al. has extensively documented that voxelized representa-tions of material structures facilitate computational eﬃciency gains throughout the learningof PSP linkages by application of the algorithms and techniques of digital signal processing[11]. For collections of large datasets, such eﬃciency is paramount. A signiﬁcant beneﬁt ofvoxelization observed at the mesoscale is that 2-point statistics can be eﬃciently calculatedusing the Fast Fourier Transform, Eq. (5) [11]. M f βγt = 1 S S (cid:88) s =1 m βs + t m γs = (cid:18) S F − (cid:2) F (cid:2) m β (cid:3) F ∗ [ m γ ] (cid:3)(cid:19) t (5) F [ · ] denotes the Discrete Fourier Transform, F − [ · ] is its inverse transform, and F ∗ [ · ] is itscomplex-conjugate. As before, the discrete 2-point statistics remain indexed by the voxelseparation, t . In contrast to the O ( N ) time complexity of implementing Eq. (3), the FastFourier Transform (FFT) algorithm facilitates the equivalent calculation, via Eq. (5), in O ( N lnN ) time. Because of the reduction in computational complexity, usage of the FFTmakes the calculation of spatial statistics aﬀordable for large collections of datasets. Dimensionality reduction techniques from digital signal processing and machine learningallow distillation of high-dimensional data to salient low-dimensional representations. Prin-ciple Component Analysis (PCA) [38, 34, 29, 33, 31] addresses this task by mean-centeringthe data and applying a rigid rotation to deﬁne a data-driven orthonormal basis organizedalong the directions of decreasing unexplained variance [51, 52]. Dimensionality reduction isachieved by utilizing only the weights of the high variance basis vectors. This technique isfavored for several reasons. First, because the transformation is simply a rigid rotation, theeuclidean distance metric is preserved. The only error in the distance between data pointsis the information lost due to truncation of the basis. Second, the information loss is easilyquantiﬁable. This loss is captured by the explained variance along each lost axis. Finally,PCA can be eﬃciently calculated using Single Value Decomposition making it feasible forvery large datasets [51, 52]. 7

Feature Extraction for Dislocation Networks

Leveraging the theory and tools introduced in the previous sections, we now propose a ﬂexi-ble framework for extracting salient, low dimensional statistics (i.e., compact representationsof spatial correlations) for collections of discrete dislocation networks. First, in the context ofthe discrete microstructure function paradigm, we propose an eﬃcient scheme for calculating2-point statistics from discrete dislocation data. Next, using PCA, we extract salient (i.e.,low dimensional) network statistics from the calculated 2-point statistics. We envision thatthis framework will nurture the study, comparison, and extraction of high ﬁdelity knowl-edge from dislocation network data. We assume that the raw data for our computationsis available in the format favored by DDD simulations, as a directed set of linear segmentswithin a simulation domain. The diﬀerent protocols employed in the proposed frameworkare described next.

First, the simulation’s volumetric domain is discretized into a uniform voxel grid. The selec-tion of the local state describing the contents of each voxel is a pivotal step in the proposedframework. Primarily, its selection dictates the accuracy with which the discrete dislocationnetwork is represented as a voxel grid and, by extension, the ability of the generated statis-tics to diﬀerentiate dislocation networks. In this work, we combine the ideas described inSec. 2.1 and Sec. 2.2 to arrive at a consistent framework for describing dislocation networks.Speciﬁcally, it is recognized that quantities such as , l , l k , l / and ( l k ) / , where k identiﬁesa speciﬁc slip system, oﬀer excellent choices for the local state variable. In other words,one can use l ks as m hs in deﬁning the spatial correlations (see Eq. (3)) of interest in thedislocation networks. However, the framework oﬀers additional ﬂexibility. For example, onecan use l k = { l kb , l kt , l kn } (see Eq. (4)) as the local states. This would, of course, improve thedescriptive accuracy of the dislocation network (by specifying individually the screw, edge,and climb components of each slip system) at an increased computational cost. For example,for an FCC crystal, the ( l k ) / description results in 12 local states per voxel (one per slip In contrast to past studies, we have preferred the dislocation length ( l ) to the dislocation density ( ρ ).In voxelized representations, these are simply related to each other by the voxel volume, which is a constantin our representation. Using the square root of the dislocation length as a local state aﬀords a more natural interpretationfor the calculated correlations. This is because the correlations have units of length. Speciﬁcally, the t = 0 auto-correlation for this choice can be related to the dislocation density through a constant factor. l k , results in 36 local states. In this work, wewill show that spatial arrangement of many dislocation networks can be diﬀerentiated usingjust ( l k ) / . Crystal symmetry poses certain challenges in assigning a reference frame to any given dislo-cation network. For example, there are 24 completely equivalent assignments of the referenceframe to a cubic crystal. One way to address this is to replicate and represent each dislo-cation network in each of its equivalent reference frames. While this strategy does addressthe problem, it also increases the variance in the overall dataset. The second, more pre-ferred, option is to standardize the assignment of the crystal reference frame such that itprovides an automated labeling order for all of the available slip systems based on speciﬁedcriteria. For example, in the case studies presented in this work, the crystal reference frameis selected such that the slip system with the highest bulk dislocation density is labelled asthe (111) [¯110] slip system. Additional criteria may be needed in other case studies. Forexample, if a dislocation network exhibits the exact same bulk dislocation density on two ofits slip systems, one would need additional criterion to assign the crystal reference frame.

A careful selection of the voxel density is critical to ensuring a suﬃciently accurate repre-sentation of the dislocation network. In general, this selection must balance computationalfeasibility with representational accuracy. In the mesoscale representations [11, 10, 30], thevoxel density is often dictated by the size of the smallest microscale constituent (i.e., phaseregions) in the material structure. However, the one-dimensional (1D) nature of dislocationsprecludes a simple extension of these prior protocols. Xia and El-Azab [48] have recom-mended the interaction length of annihilation as a discretization length scale for dislocationbased simulations. The annihilation distance varies from around 2 nm for edge dislocationsto about 50 nm for screw dislocations [53, 48]. In this study, we used a domain size of (2000 b ) ( b is the magnitude of the burgers vector) with a voxelation of × × .This corresponds to a voxel size of about b , falling within the range discussed above.The careful selection of voxel size is very important. Because of our interest in the useof PCA for the extraction of the salient features, it becomes necessary to adopt a singlevoxel size for all dislocation networks within the dataset. This standardization allows for ameaningful comparison of the calculated 2-point statistics of the diﬀerent networks [54, 31].9 .1.4 Discrete Representations A discretized representation of the dislocation network is obtained using the tools and con-cepts presented earlier. We will exemplify the steps involved using the vector local state in anFCC crystal. As already noted, this discretized representation involves the speciﬁcation of 36local states for each voxel in the dislocation network volume. This task can be addressed ina computationally eﬃcient protocol by leveraging the ray-tracing algorithm described earlier[50]. For vector local states, the inner products of the unit directions of each linear dislocationsegment in the network with the basis vectors ( { ˆ b, ˆ t, ˆ n }) of all twelve slips systems are ﬁrstcomputed. Then, the dislocation line length assigned to each traversed voxel is calculatedusing Siddon’s algorithm [50] and multiplied by the already computed direction cosines toobtain the desired dislocation descriptors introduced in Eq. (4). Completing this process forall of the line segments in the DDD dataset then results in the desired voxelized descriptionof the dislocation network with 36 local states for each voxel. The other scalar local statesdescribed earlier are similarly populated without the usage of the direction cosines. Our initial attempts at implementing the protocols described above revealed that the vox-elized representations are extremely noisy. Here, the term noise refers to the large changesto the discretized ﬁelds caused by small changes in the voxel size. This problem can beaddressed by smoothing the discretized ﬁelds, where the contribution of each line segment isspread over the voxel it is passing through and its nearest neighbors. This practice has beenimplemented successfully in many image analyses protocols [55, 56] as well as in the quan-tiﬁcation of atomistic data-sets [35]. For dislocation structures, Cai’s nonsingular burger’svector kernel (Eq. (6)) [57, 49] provides a physics-based selection of the smoothing kernel,parameterized by the radial spread, a . The kernel is mathematically expressed as ˜ w ( x ) = 158 π × (cid:34) − ma ( r /a + 1) / + ma ( r /a + 1) / (cid:35) (6)with a = 0 . a , a = 0 . a , and m = 0 . [57]. In the case studies presented in thiswork, we have decided to set a = 50 b . We found this value suﬃcient to remove signiﬁcantnoise and stabilize the statistics, while remaining suﬃciently below the average length ofindividual dislocations to maintain their linear character.Application of the kernel in Eq. (6) to the voxelized structure can be done eﬃcientlyvia the FFT. For the end goal of calculating 2-point statistics, this smoothing is achieved at10inimal additional cost by extension of Eq. (5) as M f βγt = (cid:18) S F − (cid:2) F [ ˜ w ] F (cid:2) m β (cid:3) F ∗ [ m γ ] F ∗ [ ˜ w ] (cid:3)(cid:19) t (7) The protocols described above have transformed each dislocation network into H discretespatial ﬁelds. Treating each ﬁeld as a local state, the network’s auto- and cross-correlationscan be eﬃciently calculated using Eq. (7). For a system with H states, there exist H correlations. Although H correlations can be calculated, Niezgoda et al. [58] have shown that thiscomplete set exhibits many interdependencies. Eq (8) deﬁnes the derived interrelationshipbetween the Discrete Fourier Transforms (DFT) of the 2-point statistics. M F βγt = ( M F αβt ) ∗ M F αγtM F ααt (8)From this relationship, it is clear that any correlation can be calculated from a single referencestate’s auto-correlation and that state’s cross-correlations with all other states. As a result,only H independent correlations exist. The reference slip-system selected in Sec. 3.1.2should also be used as the as the reference local state (in the presented case studies, thehighest dislocation density slip-system is used). We note that, in practice, the nonlinearityof these interdependencies often mandates inclusion of more than the minimum correlationsin successful eﬀorts aimed at learning the underlying knowledge in the datasets. Speciﬁcally,inclusion of all the autocorrelations can signiﬁcantly improve performance. Even after removal of the interdependencies, the dimensionality of the complete set of 2-point statistics is intractably large. For a voxelization of × × and H = 36 set ofcorrelations, the number of 2-point statistics is over 120 million; this results in a unwieldyset of features representing each dislocation network. PCA allows for the dimensionalityreduction needed before any further analysis (e.g., classifying dislocation networks, building11SP models). Consider a data set containing N dislocation networks, each having H localstates and S voxels. For each dislocation network, we can deﬁne a feature vector of size × ( SH ) containing the network’s vectorized 2-point statistics (see Eq. (9)). Let the elementsof this vector be denoted i g βγj , which indicates a scaled 2-point statistic of dislocation network i for voxel separation vector index j and local states β and γ . It is crucial to follow a consistentmethod for unrolling the 3D 2-point statistics throughout a dataset. This consistency allowsthe data vector generated for each structure to be compared. x i = (cid:104) i g , ..., i g S , ..., i g βγ , ..., i g βγS , ... (cid:105) ; i g βγj = i f βγj − µ βγj σ βγ (9) µ βγj = 1 N N (cid:88) n =1 n f βγj (10) σ βγ = (cid:32) S S (cid:88) s =1 N N (cid:88) n =1 ( n f βγs − µ βγs ) (cid:33) / (11)The vectorized feature list in Eq. (9) is mean-centered (see Eq. (10)) and scaled (seeEq. (11)) in preparation for PCA. Because PCA identiﬁes and aligns the data basis withthe orthogonal directions of maximum variance, the application of PCA is highly sensitiveto diﬀerences in magnitudes between the concatenated features. Smaller magnitudes willgenerally result in lower variance. For example, the magnitudes of auto-correlations aregenerally higher than those of the cross-correlations. Therefore the application of PCAdirectly on i f βγj is very likely to emphasize more the features with higher magnitudes. Thisposes a challenge, because one would ideally desire to aﬀord equal importance to each set ofspatial correlations. The normalization shown in Eq. (11) ensures that each set of spatialcorrelations corresponding to a selected pair of local states ( β, γ ) gets the same attention inthe PCA.Finally, to perform PCA and extract the desired low dimensional statistics, the N indi-vidual network vectors in a dataset are stacked to form a data matrix: X =  g · · · g S · · · ... ... ... ... N g · · · N g S · · ·  (12)PCA is most commonly performed by Singular Value Decomposition (SVD) of X [51]. Thelow dimensional statistics for each data-vector (for each dislocation network) correspond to12he weights (or PC scores) of the highest variance principle components. We present two case studies designed to illustrate the merits of the proposed protocols forcomputing and comparing diﬀerent dislocation networks based on their spatial correlations.In data-driven feature engineering approaches such as the one presented here, the results willdepend on the dataset itself. It is therefore important to aggregate a diverse collection ofdislocation networks exhibiting suitable levels of interclass and intraclass variabilities. Col-lecting such datasets from experiments or DDD simulations is quite challenging. Therefore,for the present study, it was decided to use digitally created artiﬁcial dislocation networks ex-hibiting distinct local and global features, resulting in distinct classes of dislocation networks.These artiﬁcially created dislocation networks provide an excellent test bed for exploring andunderstanding the beneﬁts and limitations of the feature engineering protocols developed inthis work.

Fig. 1 exempliﬁes the archetypes used to generate the 11 classes of dislocation networksanalyzed in this work. The diﬀerent archetypes are exempliﬁed in 2D sections for simplevisualization, although the generated dislocation networks are 3D. In the 2D schematicsshown in Fig. 1, the slip plane extends perpendicular to the image. The 11 classes ofdislocation networks generated for this study are diﬀerentiated at two scales: local andglobal. Locally, networks are diﬀerentiated by the preferential shape and orientation ofindividual dislocations. As depicted in Fig. 1, Types A, C, and E are locally identical in thatthey contain straight, uniformly oriented, dislocation segments with a speciﬁed length. Thedislocation segments in these archetypes can be of either uniformly screw or edge character(only uniformly edge networks are depicted in Fig. 1). In contrast, Type B is locallydiﬀerentiated by its circular dislocation loops. Some of the generated networks are globallydiﬀerentiated by preferential spatial arrangements of dislocations. For example, Types Aand B were created with their dislocations randomly arranged throughout the domain. Incontrast, the dislocations in Types C, D, and E were preferentially populated on uniformlyspaced planes. The alternating sense of individual dislocations on the neighboring planes inType D diﬀerentiates them from the ones in Type C (which contains dislocations of uniform13ense). Type E displays an additional level of uniformity in the placement of dislocationsegments on each plane. All the dislocation networks were produced in a FCC crystal latticeand periodicity was imposed in the generation.Fig. 1. Schematics of the archetypes used in the generation of the diﬀerent classes of dis-location networks identiﬁed in Table 1. From left to right: Straight dislocations randomlydistributed throughout the volumetric domain (Type A). Dislocations loops randomly dis-tributed throughout the volumetric domain (Type B). Straight dislocations randomly pop-ulated on uniformly spaced slip planes (Type C). Straight dislocations randomly populatedon uniformly spaced slip planes with alternating orientations on neighboring planes (TypeD). Straight dislocations uniformly populated on uniformly spaced slip planes (Type E).The archetypes in Fig. 1 have been used to generate the 11 diﬀerent classes of dislocationnetworks listed in Table 1, along with the relevant parameters used in their generation. Eachclass in this table is labelled as X:Y:N. X refers to the type of dislocations (e.g., edge (E),screw (S), loop (L)). The modiﬁer following the X label indicates that the dislocations werepopulated with length b . All others contained dislocations of length (or circumference) b . Y refers to either random placement (R) or the number of uniformly spaced planesselected for the placement of the dislocations. Letters d and e have been added to themiddle label to indicate the placement strategies identiﬁed as Type D and Type E in Fig.1, respectively. N refers to the number of diﬀerent slip systems on which the dislocationswere placed. This label has been forgone for classes with a single slip system. For thepresent study, we have limited our attention to dislocation networks on only one or two slipsystems. This simple strategy for generating networks allows us to systematically explore theeﬀect of populating additional slip systems on the computed spatial statistics. For classeswith multiple slip systems, the second system (SS2) was populated with a constant relativeposition to the ﬁrst (SS1). In the protocols used in this work, SS1 refers to the slip systemwith the highest dislocation density, and is made to always correspond to (111)[¯110] slipsystem. For each class, multiple instances were generated while maintaining the parametersidentiﬁed in Table 1, but with changes to the seeding of the dislocations.14 eneration ParametersClass Label Local Global DislocationLength Additional NotesE:R Edge Random b Class A. Populated on SS1E15:R Edge Random b Class A. Populated on SS1S:R Screw Random b Class A. Populated on SS1L:R Loop Random b Class B. Populated on SS1E:11 Edge 11 Planes b Class C. Populated on SS1.E:14 Edge 14 Planes b Class C. Populated on SS1.E:11d Edge 11 Planes b Class D. Populated on SS1.E:11e Edge 14 Planes b Class E. Populated on SS1.E:R:2 Edge Random b

60% populated on SS1 (edge), 40%populated on SS2 (edge) (% of totallength). Class A on both planes.S:R:2 Screw Random b

60% populated on SS1 (screw), 40%populated on SS2 (screw) (% of totallength). Class A on both planes.ES:R:2 EdgeScrew Random b

60% populated on SS1 (edge), 40%populated on SS2 (screw) (% of totallength). Class A on both planes.Table 1. Details of the 11 classes of dislocation networks generated for this study.15 .2 2-point statistics

The auto- or cross-correlations computed for each dislocation network can be visualized as3D maps as they reﬂect the value of a statistic for a selected voxel separation vector, t (seeEq. (3)). Fig. 2 presents the correlation maps for three example dislocation networks. Forclarity, we present only 2D cross-sections of these 3D maps. Fig. 2(a) displays the (110) cross-section from the auto-correlation map computed for SS1 dislocations for an examplenetwork from the S:R class. Fig. 2(b) displays the (¯110) cross-section from a class E:11network’s SS1 auto-correlation map. Figs. 2(c) and 2(d) display the (110) cross-section fromthe SS1 auto-correlations and the (¯110) cross-section from the SS1-SS2 cross-correlationsfor a class S:R:2 example network. We note that each presented section represents only asmall subset of the full set of 3D 2-point statistics computed for each network. These havebeen speciﬁcally selected to illustrate the main features in the computed 2-point statistics.Voxelization was accomplished using ( l k ) / local states for the two populated slip systems.The domain size for all networks was (2000 b ) , and contained dislocations totaling b inlength.The most easily interpreted statistic in the auto-correlation map is the one correspondingto t = 0 (see the center of the auto-correlation maps in Fig. 2(a)-(c)). With the use of ( l k ) / as the local state, the t = 0 auto-correlation corresponds to the expected dislocation linelength in a voxel for the k th slip system. For Fig. 2(a) and Fig. 2(b), the t = 0 auto-correlation value was ∼ . · − b . In contrast, the t = 0 SS1 auto-correlation for the S:R:2network was ∼ . · − b , which is 60% of the value for the single slip system networks.Since the S:R:2 network exhibits a 60-40 distribution of the total line length between the twoslip systems, the observations above indicate that the computed statistics are quite accurate.The relatively small magnitude of the average dislocation length compared to the voxel sizeof (13 . b ) highlights the volumetric sparcity of the dislocation networks used in this study.While the simple networks allow us to interpret their 2-point statistics maps more easily,their volumetric sparcity makes them more sensitive to computational noise.The dominant shape in the center of the auto-correlation map (see Fig. 2(a)-(c)) carriesinformation on the average shape and orientation of the local features in the dislocationnetwork. The high correlation region in the center of these maps is likely to come fromconnected voxels in individual features in the network. For example, Fig. 2(a) depictsthe auto-correlations from a class S:R network containing b dislocation segments solely Although the networks’ bulk density was maintained at a constant value, slight diﬀerences in the t = 0 auto-correlation value arise from diﬀerences in voxelization and the applied kernel smoothing protocolsdescribed in Sections 3.1.4 and 3.1.5. (110) planefor SS1 dislocations in an example network from S:R class. (b) Auto-correlation map on the (¯110) plane for SS1 dislocations in an example network from E:11 class. (c) Auto-correlationmap on the (110) plane for SS1 dislocations in an example network from S:R:2 class. (d)Cross-correlation map on the (¯110) plane between SS1 and SS2 dislocations in an examplenetwork from S:R:2 class.directed along the slip direction. Consequently, for any voxel containing a dislocation inthis network, we expect to ﬁnd additional voxels containing dislocations by traversing alongthe slip direction, with the probability of success dropping as we move far away from theinitially speciﬁed voxel. This expectation quantitatively manifests as the central ellipticalfeature oriented along the screw direction in Fig. 2(a). This feature’s length, ∼ b ,reﬂects the fact that the probability of ﬁnding a voxel with a dislocation essentially goesto zero when one traverses b along the slip direction in this network (in both positiveand negative directions along the slip direction). The feature’s intensity decreases from amaximum at t = 0 to a minimum at either tip. The auto-correlation map in Fig. 2(a) alsoindicates that the probability of ﬁnding another voxel with a dislocation decreases sharplyfor traversals in any other direction. Additionally, the lighter horizontal bands in the auto-correlation map in Fig. 2(a) capture the stacking of the dislocations in the parallel planes.The random arrangement of these secondary bands reﬂects the fact that the planes for theplacement of the dislocations in the S:R network were selected randomly.Similarly, the auto-correlation map presented in Fig. 2(b) captures the features expectedfor E:11 networks. In these networks, the dislocation lines are placed along the [11¯2] crystal17irection (because these are edge dislocations on the (111)[¯110] slip system). The centralelliptical feature in this ﬁgure can be interpreted the same way as before. In contrast to Fig.2(a), the secondary bands are much stronger and display uniform spacing. These featuresreﬂect the fact that, for class E:11 networks, the dislocations were populated on uniformlyspaced planes. The distances between these bands (approximately ∼ b ) matches thespacing of populated planes in the generation of this network.The auto-correlation and cross-correlation maps in Fig. 2(c) and 2(d) capture the 2-point statistics for an example dislocation network in the S:R:2 class. As expected, theSS1 auto-correlation map is visually quite similar to the auto-correlation map for the S:Rnetwork in Fig. 2(a). This is because the same strategy was used in placing the dislocationson the SS1 slip systems in both these networks. The main diﬀerence, as already noted, isthat the correlations are roughly scaled down by 60 % , reﬂecting the lower overall densityof SS1 dislocations in the S:R:2 network. The good comparison of the features in the auto-correlation maps in 2(a) and 2(c) conﬁrm that these maps mainly reﬂect the generationparameters and not the seeding of individual dislocations in creating a speciﬁc instantiationof the dislocation network. The main diﬀerence between the one and two slip system networksis actually seen in the cross-correlation maps. For a network with a single slip system, thecross-correlation map would only show zero values. For the two slip system network, thecross-correlation map presents information on spatial correlations between the dislocationson the two slip systems. For example, Fig. 2(d) displays the expected SS2 neighborhoodsurrounding a voxel containing an SS1 dislocation for an example S:R:2 network. The mapdisplays faint linear features extending in the [110] crystal direction. Such features indicatethat when a voxel containing an SS2 dislocation is observed near a SS1 containing voxel, itis highly probable that more SS2 containing voxels can be found by traversing in the [110] direction. This reﬂects the fact that, for S:R:2 networks, SS2 dislocations were (1¯11)[110] screw dislocations. Furthermore, the random arrangement of the SS2 dislocations withrespect to the SS1 dislocations is quantitatively captured in the random placement of theselinear features.We note the ﬂuctuations observed in the present secondary features. These ﬂuctuationsare a consequence of under sampling in the spatial averaging used to approximate the networkcorrelations, Eq. (2). For example, in Figs. 2(a) and 2(c), the divergence from the nearlyuniform long-range correlation we would expect from globally random networks, illustratesthat the spatial averaging is still capturing some of the speciﬁc seeding of the networkinstance. In general, this behavior is a consequence of the networks’ sparsity and the 1-D nature of the individual dislocations. These characteristics make it diﬃcult to spatially18ample a suﬃcient number of features to accurately calculate the network class’s correlation.In the subsequent case studies, these ﬂuctuations will account for the observed intraclassvariance. In Sec. 5, we will present several methods for minimizing this noise. This case study is aimed at demonstrating the ability of the proposed statistics to automat-ically distinguish (i.e., conduct unsupervised classiﬁcation) generated networks exhibitingdiﬀerences in both their bulk dislocation density as well as their archetype (see Fig. 1).For this purpose, dislocation networks of classes E:R, S:R, and L:R were generated at threedistinct total lengths: b , b , and b , resulting in nine diﬀerent classes ofnetworks. For each class, 30 instantiations were made using the same generation strategybut changing the seeding of the dislocations in the networks. The dislocation density valueswere chosen to mimic the densities observed in realistic dislocation networks [6].Fig. 3 presents the 2-point statistics of the generated networks in reduced PC subspaces.Since the networks were not labelled in any manner before performing the PCA, it is satis-fying to observe that the proposed protocols are able to successfully separate the dislocationnetworks into classiﬁable groupings. It is also clearly seen that the intraclass variance (cap-turing the diﬀerences between multiple networks of a given class) is signiﬁcantly smaller thanthe interclass variance (i.e., diﬀerences between the networks produced using diﬀerent gen-eration parameters). Fig. 3(a) illustrates that the ﬁrst PC score largely captures diﬀerencesin the dislocation line length (corresponding to the t = 0 auto-correlation). It must also benoted that the ﬁrst PC score will not exhibit an exact one-to-one exclusive mapping to thenetwork’s total dislocation length. Instead, this information is likely distributed throughoutall the PC scores in very diﬀerent ways. For example, Fig. 3(b) shows that the distance fromthe origin in the PC2-PC3 subspace is also likely to correlate well with the overall dislocationdensity in the network.In addition to discriminating the networks by their dislocation densities, the protocolsused in this work have successfully identiﬁed additional diﬀerences between the multipleclasses of networks. It appears that PC2 is reﬂecting the diﬀerences in the dislocationarchetypes (i.e, edge, screw, and loop) used for generating the networks. Speciﬁcally, it issatisfying to see that the PC2 scores for L:R are about midway between the PC2 scores ofE:R and S:R, thereby recognizing that the loops are indeed made of roughly equal amountsof screw and edge components. It is remarkable that the protocols identiﬁed this featureautomatically (i.e., in an unsupervised setting). Furthermore, the PC3 scores appear to19ig. 3. Representation of the generated dislocation networks in the reduced-order PC space.(a) PC1-PC2 subspace, and (b) PC2-PC3 subspace. Each point in these plots represents adislocation network, with the shape of the data point referring to the dislocation archetypeand the gray-scale reﬂecting the total dislocation length in the network.speciﬁcally separate the loop archetype from the archetypes containing straight dislocations.The dimensionality reduction oﬀered by the protocols in the example shown here is quiteremarkable, from about 3.4 million 2-point statistics to 3. In this speciﬁc example, the ﬁrstthree PC scores explained roughly 75% of the total variance in the data-set. This means thatthe higher PC scores still contain fairly signiﬁcant information. However, the interpretationof the features captured by the higher PC scores becomes increasingly diﬃcult, since eachPC score still represents a weighted combination of about 1.1 million 2-point statistics forthis case study. Also, given the volumetric sparsity of the dislocation networks studied here,some of the higher PC scores simply capture the computational noise, which is a part of theintraclass variance seen in Fig. 3. It is emphasized here that the clean separation of the signaland white noise highlights one of the important beneﬁts of PCA. As has been documentedextensively in digital signal processing literature [52, 59], PCA acts as a denoising ﬁlter. Forthe study of dislocation networks, where the sparsity induces signiﬁcant noise, this propertyis especially valuable. As the second case study, we will demonstrate the ability of the proposed protocols to dis-criminate dislocation networks based solely on their local and global generation parameters.In other words, we will apply the protocols to an ensemble of dislocation networks thatexhibit the same bulk dislocation density but very diﬀerent local and/or global dislocationplacement characteristics (see Fig. 1 and Table 1). The total dislocation line length wasmaintained at b for all networks generated for this case study. Twenty networks were20enerated for each of the 11 classes listed in Table 1. It is emphasized that all 220 dislo-cation networks produced for this case study are essentially indistinguishable if dislocationdensity was used as the sole network quantiﬁcation metric. Employing the computationalprotocols presented in this work, the SS1 auto-correlations, the SS2 auto-correlations, andthe SS1-SS2 cross-correlations were computed and included in this analyses ( ∼ million2-point statistics).Fig. 4 presents the 2-point statistics of the generated networks in the reduced PC space.As before, in a strictly unsupervised manner, the proposed protocol has successfully identiﬁedthe inherent structural diﬀerences in the ensemble of generated dislocation networks, andhas compactly captured these diﬀerences in just a few PC scores. Indeed, it is satisfyingto observe that PCA has learned to separate each class into its own cluster and that theinterclass variance signiﬁcantly exceeds the intraclass variance. The ﬁrst 5 PC scores captureroughly 60% of the total variance in the dataset. As before, the higher PC scores capturesome of the more complex diﬀerences responsible for intraclass variance and noise.Because of the increased ensemble diversity included in this case study, the interpretationof the features captured by each PC basis is signiﬁcantly more diﬃcult. This is often thecase with the use of PCA for dimensionality reduction in the extraction of material structurestatistics [31, 30]. Even so, the ﬁrst few PC scores still display interpretable behavior. Fig.4(a) illustrates that the ﬁrst PC score separates networks with single populated slip systems(on the left) from networks with two populated slip systems (on the right). This is becausethe values of the auto-correlation at t = 0 for these two sets of networks exhibit the highestvariance, amongst all of the computed spatial statistics for each network for this case study . This behavior is consistent with the observations in Case Study 1, and should be generallyexpected.The second PC score appears to capture the diﬀerences in the network’s dislocationcharacter. Observing Fig. 4(b) from left to right, networks containing screw dislocations arepresent on the left, whereas edge dislocation containing networks are on the right. Again,the protocol has identiﬁed that loop containing networks display both behaviors. Interest-ingly, the two slip system networks are diﬀerentiated by the dislocation character on theirsecondary slip-system. Speciﬁcally, ES:R:2 networks, which have edge dislocations on SS1,are categorized on the left side.As before, the higher order PC scores become increasingly diﬃcult to interpret. Forexample, PCs 3 and 4 display limited intuitive structure. As an alternative, the dendrogram Although the total dislocation length was maintained constant, the dislocation density for each slipsystem has been varied with the inclusion of two slip system networks (¯110) cross-sections, while the remaining display the (110) cross-sections.The PC1 basis vector (see Fig. 6) exhibits a large number of high correlation featuressimilar to those discussed earlier. For example, the high intensity features in the centers ofthese basis maps are clearly capturing certain combinations of the screw and edge charactersof dislocations in both the SS1 and SS2 slip systems. Similarly, certain aspects of the globalarrangements of the dislocations are also captured in the PC1 basis. For example, Fig. 6(a)displays the 11 uniform planes that were populated in the majority of the classes with globalstructure, while Fig. 6(b) captures the structure of Class E:11e networks displaying uniformspacing within the slip plane. Because of the high values in center of the PC1 basis maps andbecause none of the previously identiﬁed features are speciﬁcally targeted, we can concludethat it is capturing a weighted combination of the dislocation densities on both slip systems.Furthermore, we observe that the second autocorrelation map is generally positive, while theﬁrst is generally negative. Combining these observations, we can see why the ﬁrst PC scoreeﬀectively separates networks by the relative dislocation populations on the ﬁrst and second23ig. 6. Selected 2-D cross-sections from the auto-correlation components of the ﬁrst basisvector. (a) (¯110) plane of SS1 auto-correlation, (b) (110) plane of SS1 auto-correlation, (c) (¯110) plane of SS2 auto-correlation, (d) (110) plane of SS2 auto-correlation.slip systems.

Although the proposed protocols demonstrated tremendous promise, one needs to pay carefulattention to the consequences of the voxelization employed in the computation of the 2-pointstatistics. In general, correlations calculated on sparse voxelizations (i.e., containing a lot ofzeros) will display signiﬁcant noise because the majority of the voxels will not contribute tothe correlation [60, 39, 11]. However, because of the 1D nature of individual dislocations,such a voxelization is necessary to adequately separate dislocations and represent their be-havior (e.g., shape, curvature). We note that even with these challenges, the signiﬁcantimprovement in computational eﬃciency oﬀered through the FFT justiﬁes voxelization.There exist diﬀerent strategies to minimize the statistical noise described above. In theproposed protocol, it was partially addressed using kernel smoothing. However, that im-provement is largely limited by a need to avoid excessively distorting the shape of individualdislocations. Building on prior eﬀorts at the mesoscale, [61, 62, 38], two other strategiescould be utilized. First, the domain size can be increased to attain a more representativevolume. Second, statistics calculated on several independent and identically distributed(i.i.d.) small networks can be averaged for an equivalent eﬀect [61]. Both strategies have24heir limitations. In dislocation studies, increasing the domain size comes with signiﬁcant,insurmountable computational cost [6]. However, identiﬁcation of multiple i.i.d. networkscan prove equally diﬃcult. This is exacerbated because there are not reliable ways to sys-tematically generate realistic dislocation networks.Fig. 7 illustrates the beneﬁts of the two solution strategies described above using classE:R networks. For a single E:R network, Fig. 7a, the secondary bands in the auto-correlationmaps are erroneously suggesting that the selection of the planes for the placement of thedislocations is not completely random. These features are signiﬁcantly reduced when thevolume is increased to (4649 b ) at the same overall dislocation density, Fig. 7(b). Theartifacts are also removed by the averaging the auto-correlations of twenty i.i.d. networks ofthe original size. Note that the t = 0 auto-correlation value for all three plots is the same,indicating that the dislocation density is the same.Fig. 7. (110) Cross-sections from E:R network SS1 autocorrelation maps. (a) (2000 b ) domain, single network (b) (4649 b ) domain, single network (c) average from twenty (2000 b ) domains. In this paper, a computationally eﬃcient, ﬂexible framework was presented for calculatingsalient, low dimensional, spatial statistics for dislocation networks. More speciﬁcally, itwas shown that Siddon’s algorithm and the FFT can be used to calculate eﬃciently 2-point statistics for discrete dislocation networks. To address the unwieldy dimensionalityof the calculated 2-point statistics, it was demonstrated that PCA can be used to extractlow-dimensional salient features from the 2-point statistics of an ensemble of networks. Toillustrate their application within the data scientiﬁc study of dislocation networks, the PCscores were used to diﬀerentiate dislocation networks based on their bulk dislocation densityas well the local and global behavior of their constituent dislocations. Finally, strategieswere discussed to mitigate the statistical noise in the computed 2-point statistics.25

Acknowledgement

The authors would like to acknowledge funding from ONR N00014-18-1-2879.

The authors declare that they have no conﬂict of interest.

References [1] D. McDowell, Int. J. Plast. , 1280 (2010)[2] J. Elliot, Int. Mater. Rev. , 207 (2011)[3] F. Roters, P. Eisenlohr, L. Hantcherli, D. Tjahjanto, T. Bieler, D. Raabe, Acta Mater. , 1152 (2010)[4] D. Drucker, J. Eng. Mater. Technol. , 286 (1984)[5] D. McDowell, Mater. Sci. Eng., R , 67 (2008)[6] R. LeSar, L. Capolungo, In Handbook for Materials Modeling (Springer, Switzerland,2020)[7] A. El-Azab, G. Po,

In Handbook of Materials Modeling (Springer, Switzerland, 2020)[8] J. Greer, W. Nix, Phys. Rev. B , 245410 (2010)[9] I. Steinbach, Modell. Simul. Mater. Sci. Eng. , 073001 (2009)[10] Y. Yabansu, P. Steinmetz, J. Hotzer, S. Kalidindi, B. Nestler, Acta Mater. , 182(2017)[11] B. Adams, S. Kalidindi, D. Fullwood, Microstructure Sensitive Design for PerformanceOptimization (Butterworth-Heinemann, Waltham, MA, 2013)[12] D. Fullwood, S. Niezgoda, B. Adams, S. Kalidindi, Prog. Mater. Sci. , 477 (2010)[13] D. Brough, D. Wheeler, S. Kalidindi, Integr. Mater. Manuf. Innov. , 36 (2017)2614] G. Liu, S. House, J. Kacher, M. Tanaka, K. Higashida, I. Robertson, Mater. Charact. , 1 (2014)[15] T. Ruggles, Y. Yoo, B. Dunlap, M. Crimp, J. Kacher, Ultramicroscopy , 112927(2020)[16] J. Barnard, J. Sharp, J. Tong, P. Midgley, Science , 319 (2006)[17] S. Torquato, Random Heterogeneous Materials (Springer, New York, NY, 2002)[18] J. Hirth, J. Lothe,

Theory of Dislocations (Krieger, Malabar, FA, 1982)[19] D. Hull, D. Bacon,

Introduction to Dislocations (Butterworth-Heinemann, Burlington,MA, 2011)[20] P. Franciosi, M. Berveiller, A. Zaoui, Acta Metall. , 273 (1980)[21] M. Zaiser, K. Bay, P. Hahner, Acta Mater. , 2463 (1999)[22] J. Deng, A. El-Azab, Modell. Simul. Mater. Sci. Eng. , 075010 (2009)[23] H. Wang, R. Lesar, J. Rickman, Philos. Mag. A , 1195 (1998)[24] C. Landon, B. Adams, J. Kacher, J. Eng. Mater. Technol. , 021004 (2008)[25] J. Anderson, A. El-Azab (2020)[26] I. Groma, P. Balogh, Acta Mater. , 3647 (1999)[27] I. Groma, F. Csikor, M. Zaiser, Acta Mater. , 1271 (2003)[28] D. Fullwood, S. Kalidindi, B. Adams, Chapter 13 in Electron Backscatter Diﬀraction inMaterials Science (Springer, Boston, MA, 2009)[29] T. Fast, O. Wodo, B. Ganapathysubramanian, S. Kalidindi, Acta Mater. , 176(2016)[30] A. Choudhury, Y. Yabansu, S. Kalidindi, A. Dennstedt, Acta Mater. , 131 (2016)[31] S. Kalidindi, J. Gomberg, Z. Trautt, C. Becker, Nanotechnology (2015)[32] M. Latypov, L. Toth, S. Kalidindi, Comput. Methods Appl. Mech. Engrg. , 180(2019) 2733] N. Paulson, M. Priddy, D. McDowell, S. Kalidindi, Int. J. Fatigue , 1 (2019)[34] N. Paulson, M. Priddy, D. McDowell, S. Kalidindi, Acta Mater. , 428 (2017)[35] M. Barry, K. Wise, S. Kalidindi, S. Kumar, J. Phys. Chem. Lett. , 9093 (2020)[36] E. Kroner, Statistical Continuum Mechanics (Springer, New York, NY, 1972)[37] A. El-Azab, Scr. Mater. , 723 (2006)[38] S. Niezgoda, Y. Yabansu, S. Kalidindi, Acta Mater. , 6387 (2011)[39] A. Papoulis, S. Pillai, Probability, Random Variables, and Stochastic Processes (McGrawHill, Chennai, 2002)[40] M. Sauzay, L. Kubin, Prog. Mater. Sci. , 725 (2011)[41] J. Deng, A. El-Azab, Philos. Mag. , 3651 (2010)[42] T. Hochrainer, M. Zaiser, P. Gumbsch, Philos. Mag. , 1261 (2007)[43] R. Sedlacek, C. Schwarz, J. Kratochvil, E. Werner, Philos. Mag. , 1225 (2007)[44] S. Limkumnerd, E.V. der Giessen, Phys. Rev. B , 184111 (2008)[45] P. Valdenaire, Y.L. Bouar, B. Appolaire, A. Finel, Phys. Rev. B , 214111 (2016)[46] A. Arsenlis, D. Parks, Acta Mater. , 1597 (1999)[47] E. Kroner, Int. J. Solids Struct. , 1115 (2001)[48] S. Xia, A. El-Azab, Modell. Simul. Mater. Sci. Eng. , 055009 (2015)[49] N. Bertin, Int. J. Plast. , 268 (2019)[50] G. Han, Z. Liang, J. You, IEEE Nucl. Sci. Symp. Conf. Rec. p. 1515 (2000)[51] J. Shlens. A tutorial of principal component analysis. URL . Accessed 28November 2020.[52] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning (Springer,New York, NY, 2016) 2853] U. Essmann, H. Mughrabi, Philos. Mag. A , 731 (1979)[54] A. Cecen, Calculation, utilization, and inference of spatial statistics in practical spatio-temporal data (Georgia Tech Library, Atlanta, GA, 2017)[55] A. Makandar, B. Halalli, Int. J. Comput. Appl. p. 109 (2015)[56] A. Iskakov, S. Kalidindi, Integr. Mater. Manuf. Innov. , 70 (2020)[57] W. Cai, A. Arsenlis, C. Weinberger, V. Bulatov, J. Mech. Phys. Solids , 561 (2006)[58] S. Neizgoda, D. Fullwood, S. Kalidindi, Acta Mater. , 5285 (2008)[59] M. Chawla, Appl. Soft Comput. , 2216 (2011)[60] M. Vetterli, J. Kovacevic, V. Goyal, Foundations of Signal Processing (Cambridge Uni-versity Press, Cambridge, UK, 2014)[61] S. Niezgoda, D. Turner, D. Fullwood, S. Kalidindi, Acta Mater. , 4432 (2010)[62] T. Kanit, S. Forest, I. Galliet, V. Mounoury, D. Jeulin, Int. J. Solids Struct.40