A predictive failure framework for brittle porous materials via machine learning and geometric matching methods

Brittle porous materials are used in many applications, such as molten metal filter, battery, fuel cell, catalyst, membrane, and insulator. The porous structure of these materials causes variations in their fracture strength that is known as the mechanical reliability problem. Despite the importance of brittle porous materials, the origin of the strength variations is still unclear. The current study presents a machine learning approach to characterize the stochastic fracture of porous ceramics and glasses. A combined finite element modeling and fracture mechanics approach was used to generate a unique empirical data set consisting of normalized stress intensity factors (nSIFs, KI/σ∞ = Yπa\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \sqrt {\pi a} $$\end{document}) that define fracture strength of brittle systems under uniaxial tensile loading and biaxial tensile loading. These empirical data sets were used to generate prediction functions and validate their accuracy. Monte Carlo simulations with two machine learning algorithms, random forests (RF) and artificial neural networks (ANN), were used to simultaneously determine the optimum percentages for the training and test data set split and the prediction function validation. The constraint was taken to be the mean absolute percentage error (MAPE) during the process. In the implementation step, new porous media with uniformly distributed pores were created and the prediction functions were used to obtain nSIFs and characterize the media. As a novelty of this approach, which ensures the predictive characterization of the generated media, a geometric matching method by means of the Euclidean bipartite matching between the empirical and the generated media was presented and the nSIFs were compared by means of MAPE. As a result of the study, MAPE ranges are 3.4–17.93% (uniaxial load) and 2.83–19.42% (biaxial load) for RF, 3.79–17.43% and 3.39–21.43 for ANN at the validation step; 3.54–18.20% (uniaxial load) and 3.06–21.60% (biaxial load) for RF, 3.57–18.26% and 3.43–21.76% for ANN at the implementation step. The proposed approach can be thus used as a predictive characterization tool, especially for the analysis and Weibull statistics of porous media subjected to brittle failure.


Introduction
Porous materials are ubiquitous in nature. Porosity functionalizes biological materials by enabling mass transportation and storage. For example, porous structure of wood, crab exoskeleton, and diatoms allows nutrition transportation; pores in bone allow storage of cells and reduce weight [1,2]. Industrial porous materials are also used in many applications, including battery, solid oxide fuel cell, filter, insulation, and bio-scaffolds [1,[3][4][5][6]. However, porosity reduces mechanical performance and reliability, especially in brittle porous materials (BPMs), e.g., porous ceramics and glasses. Prediction of brittle failure in porous systems is challenging due to complex stress interactions between cracks, pores, and grains [7][8][9][10][11][12][13]. Various analytical and numerical approaches have been used to quantify the fracture behavior of BPMs, but the complex stress interactions between pores and crack size distributions limit these investigations to single/few pore(s) and small simulation domains with limited number of pores [7,9,12,14,15]. Spatial distribution of pores can be obtained via X-ray computed tomography [16], but accurate description of initial defect size and position distributions is unobtainable. Moreover, reliability analyses-stochastic investigation of strength values-require sample size (N) [ 30 [17]. This large N restricts (a) experimental investigations due to sample preparation and testing costs and (b) deterministic numerical approaches due to computational cost. In this context, machine learning (ML) approaches arise as an extension of numerical investigations [18][19][20][21]. However, predictive capabilities of ML for stochastic brittle fracture problems are unknown.
Brittle fracture is the main problem of ceramics and glasses, which is amplified for porous systems. Therefore, in our previous work, we combined fracture mechanics (FM) and finite element method (FEM) to better understand stochastic fracture in porous alumina and glasses [8]. Isotropic systems such as glass eliminate elastic mismatch stresses between grains, which simplifies the FEM implementation by avoiding elastically anisotropic grains. This approach was developed to observe the effects of the most significant stress concentrators, which were the pores and cracks. The thermal expansion anisotropy and elastic stiffness anisotropy effects are not as severe as the pore-crack combination, but can be implemented in the FEM scheme in the future studies. The combined FM-FEM approach allowed us to predict the changes in fracture strength (r f ) with increasing porosity in porous glasses and porous alumina. The results showed that the r f slightly decreased with increasing number of pores for 1-2 vol% porosity. On the other hand, r f decreased significantly up to 10 vol% porosity due to strong stress interactions between the pores. Beyond 10 vol% porosity and less than * 30 vol%, the r f did not change as strongly as the initial decrease for up to 10 vol%, because the stress interactions were saturated, i.e., pore-to-pore stress interactions did not increase significantly because the structure already includes strongly interacting pore clusters for porosity [ 10 vol%. We proposed a new equation to capture r f dependence on porosity based on our numerical observation: Here, r k is related to the strength of pore-free material; c and d are empirical constants; and w represents the fraction of porosity. The second problem of ceramics and glasses is the variation in their r f . This second problem is known as the mechanical reliability problem that limits the use of brittle materials in critical applications. The variation in r f has been quantified using extreme value statistics, specifically Weibull statistics. The two-parameter Weibull distribution provides a link between the probability of failure F and applied stress and volume of the specimen, which is as follows: where the V 0 and r 0 are the normalizing constants. The ''m'' is Weibull modulus, which describes the degree of variation in r f . The V can be assumed to equal to V 0 for samples of the same volume, making the r 0 and m as the only parameters to be fitted to any data. Common values of m for ceramics range from 5 to 20, whereas metals can have m = 90-100. The lower the m, the higher the variation in r f . Our FM-FEM approach was performed on unique structures containing pores, which was the first approach to include both the effects of random porous structure and crack sizes on the variation in r f . We predicted r f of at least 40 structures for fixed vol% of porosity, but including random spatial distribution of pores. The use of 40 structures allows estimation of Weibull distribution parameters with a low error. The predicted Weibull modulus depended on the vol% of porosity and initial crack size distribution. We observed a decrease in m value with increasing porosity even for a change in porosity from 1 to 2 vol%. This result shows that the decrease in reliability depends on pore-to-pore stress interactions. The experimental m value for BPMs is below 10 for structures with porosity [ 10 vol%. We did not observe statistically significant change in m for porosity [ 10 vol%, which is consistent with the experimental observations. We also run simulations to investigate the effect of crack size distributions in porous glasses, as BPMs contain a distribution of crack sizes and pore sizes. These simulations contained pores of fixed size, but randomly distributed in the structure. Four types of crack size distributions were used: Gaussian, Pareto, and two different experimental distributions. We revealed that the effect of crack size distribution on the stochastic fracture decreases for larger pores, because the crack size-to-pore diameter ratio decreases and the crack tip stresses become similar. As a result, pore-to-pore stress interactions dominate fracture behavior for crack size-to-pore diameter ratio smaller than 0.1.
In addition to our FM-FEM approach, porous media were also simulated using a lattice spring model and Brazilian disk tests were virtually performed on porous medial by Zhiwei et al. [12]. This lattice model of BPM fracture also showed decreasing Weibull modulus with increasing porosity up to 10 vol% and showed insensitivity of Weibull modulus to porosity between 20 and 50 vol% [12]. However, the reported Weibull moduli were higher than * 60 for 12 sets of porous medial with different porosity and pore size, except one set with the largest pore size (pore sizes were relative in this study without any absolute value). These Weibull moduli are higher than reported values of less than 10 for porosity [ 10 vol%, indicating the complexity of the problem [8]. Moreover, stochastic FEM models were used to predict fracture behavior of porous ceramic matrix composites, where the focus was on the largest pores with circumferential cracks [22]. In this study, pores were not explicitly simulated, but stochastic material properties were distributed in a rectangular 3-point bend specimen. The distribution of fracture strength throughout the specimen generated a Weibull modulus of 12-16 that was representative of the experimental system, but the effect of pores size was not discussed. Other Monte Carlo fracture mechanics simulation approaches describing fracture behavior of brittle porous systems were also reported without pore-to-pore stress interactions [23].
Although the FM-FEM combined approach was successful in predicting the relationships between fracture behavior and porosity in sets of 40 explicit microstructures, computational cost of FEM was prohibitive for an investigation of higher number of larger microstructures with larger number of pores. Reliable and safe use of BPMs requires better understanding of their stochastic fracture, hence large data sets. In this context, informatics can provide computationally cheaper approaches for the analysis of stochastic fracture, using limited number of FEM simulations as the training data set. For example, Gu et al. [18] used FEM to observe crack tip stress fields and used ML approaches to predict local microstructure near crack tip to reduce near-tip stresses and toughen the composite material. Moore et al. [24] applied decision trees (DT), random forests (RF), and artificial neural network (ANN) algorithms for the prediction of dynamic fracture in terms of the fracture path, where they predicted the fracture path with * 85% accuracy. Mangal and Holm also performed a study, where they used the RF model to predict stress hot spots in grains of metallic materials [25]. These stress hot spots are rare events in materials under stress, as they occur in regions representing less than 10 vol% of the metal. On the other hand, from a fracture mechanics point of view, fracture in brittle materials depends on the maximum normalized stress intensity factor nSIF (Y ffiffiffiffiffi pa p = K I / r ? ), assuming fracture toughness is independent of crack size. We used the maximum nSIFs in our previous study to build ML models [21]. The RF model resulted in prediction accuracy of * 85% for simulation results both under uniaxial and biaxial tensile loads. However, this work was based on the maximum nSIF in a microstructure, not in a single pore. Therefore, in this study, we used the nSIF data, which included the maximum nSIF values for each pore in a microstructure and pore position, a total of 14,880 pores from 680 unique microstructures. These FM-FEM-generated data were used to build ML predictors using RF and ANN. We used the predicted nSIFs to obtain distribution of r f and related Weibull statistics for porous structures. As a novelty of the present study, we conducted the predictive failure characterization and error quantification for the generated microstructures in comparison with the previously simulated microstructures, depicted in Fig. 1. The error quantification was based on the mean absolute percentage error (MAPE), for which the empirical results were taken from the previously conducted FM-FEM studies and the predicted results were obtained from the generated prediction functions implemented on the generated microstructures. Here, the relationship between the empirical and predicted results was established on the geometric likelihood, i.e., the Euclidean bipartite matching of the pore centroid proximities of the previously investigated and generated microstructures. This enabled us to compare the microstructures with a known metric. This metric can be extended to other types of microstructural features, such as pore shape/orientation, crack size/orientation, and other geometrically representable features.

Methodology
FEM simulations on porous two-dimensional microstructures were previously performed using OOF2 program [26]. Details of FEM simulations were reported in [7,8]. In these simulations, porous material was assumed to be a brittle isotropic material with elastic modulus E = 100 GPa and Poisson's ratio v = 0.3. Uniaxial and biaxial tensile loads were investigated to understand the effects of uniformly distributed pores on the stochastic fracture of the BPMs. Uniaxial tensile stress was applied by keeping the bottom boundary fixed and displacing the top boundary by 0.2 lm (0.01% displacement); the left and right boundaries were mechanically free. Equibiaxial stress was applied by keeping the bottom and left boundaries fixed and displacing the top and right boundaries by 0.2 lm. The linear systems of equations were solved using a conjugate gradient method with an incomplete lower-upper (LU) factorization as the preconditioner. The pore size was kept constant in each microstructure of size 2000 lm 9 2000 lm. The pore centroids were randomly generated within this microstructure. The minimum distance between non-overlapping pores was 15 lm, where the pores were generated 200 lm away from the edges to reduce the stress interactions with free surfaces. Each simulation had * 50,000 quadratic triangular elements. Adaptive meshing scheme in OOF2 was used to increase the mesh density near pores, which corresponded to a segment size of * 6 lm along the perimeter of pores. The simulated stress distributions were used to calculate geometric factor Y for each pore centroid positions (x i , y i ) [8]. The number and radii of pores were taken as n pores = {5,10,20,30,40,50} and R = {40,60,80} lm, respectively. Sets of 40 unique microstructures were simulated with total number of 680 unique microstructures.
Cracks were assumed to be present perpendicular to the pores. A normal distribution (ND) of crack sizes was assumed with a mean = 8.8 lm and standard deviation = 2.8 lm. This ND was bounded at zero and 15 lm that avoided negative crack sizes and pore-connecting cracks. The grain size distribution is taken to be equal to the crack size distribution. The use of grain size as initial crack size was also suggested by Evans and Langdon [27]. These cracks were randomly generated along the perimeter of the pore. The crack length was used as the distance to the next crack.
Once the crack size and orientation is known, the stress distributions perpendicular to the crack face were used to calculate the geometric factor according to the method given by Shah [28]. Linear elastic fracture mechanics establishes the basis for the relationship, for which r f distribution is equal to the K IC /Y ffiffiffiffiffi pa p distribution. As a result, stochastic fracture behavior depends on the distribution of Y ffiffiffiffiffi pa p , which we called as the normalized stress intensity factor (nSIF). In the present study, maximum nSIF for each pore was used as a part of the training/test data, a total of 14,880 pores for 680 microstructures as it can be deduced from Table 1. We performed 1360 FEM simulations under uniaxial loading and biaxial loading using 680 unique microstructures.

Prediction and validation
We used two machine learning algorithms-RF and ANN-to generate prediction functions. RF, or more specifically Breiman-Cutler ensembles of decision trees, was initially introduced by Breiman based on decision trees for classification and regression analysis [29]. A decision combination model P is formed from a sequence of models f, or known as decision trees, dependent on input data x i where i ¼ 1; . . .; n f g and can be broadly represented as Initially, RF produces bootstrapped data and assigns a decision tree f for each data subset. These decision trees (i.e., single classifiers) lay the ground for the final prediction by means of forming the forest (i.e., group of classifiers) through bootstrap aggregation (bagging) samples, which means fitting trees to the subsampled data and averaging all the trees (here, used for the regression purpose). During the process, each tree is independently formed through individual data sampling, which makes RF an efficient approach for numerical data handling and capturing linear and nonlinear interaction between the input and output data [30]. In the present study, 100 trees were used based on the default value of RF method in Mathematica ''Predict'' function [31].
ANN is a systematic approach consisting of interconnected neuron-like structures in a multilayered system with an input layer, an output layer and hidden layers facilitating the association of input and output [19]. Its architecture with connected supervised feed-forward neural network containing minimum one hidden layer is called multilayer perceptron (MLP). Additionally, the gradient of the error function can be calculated by back-propagation algorithm (BPA) by propagating toward the hidden layer from the outermost layer [30]. The mathematical expression of how MLP maps the input value of ith independent value Xi and prediction value output P can be presented as [32,33]: Here, n refers to the number of input parameters, which is equal to the number of samples and m is the number of hidden neurons, whereas g and h are the activation functions of output and hidden neurons, respectively. In the present study, activation functions are chosen to be hyperbolic tangent function (tanh), while number of hidden neurons is 10 with two hidden layers following the default values implemented by Mathworks [34]. The weights of the connection values, i.e., w ij between the ith input neuron and jth hidden neuron and v j between jth hidden neuron and output neuron, are inserted to increase the precision of P and b j , which is the bias of the jth hidden neuron. In addition, an offsetting factor h for a better fit is also available in Equation [20].
In order to train these two models, the empirical data were chosen to be the previously conducted nSIFs from the FM-FEM simulations on the porous ceramics subjected to uniaxial and biaxial loading cases and used as the input data (x i ) [8,35]. In order to simultaneously determine the optimum percentages for the training and test data split and the prediction function validation, Monte Carlo simulations were used with RF and ANN that were readily available in Mathematica [31]. The mean absolute percentage error MAPE for each individual specimen was taken to be the decisive parameter in these simulations, for which is expressed as Here, n pores refers to the number of pores in the porous medium, E is the empirical results, and P is the predicted output for each specimen. Following the Monte Carlo simulations of several repetitionse.g., 10 repetitions in the present study-the minimum MAPE value determined and the corresponding training set percentage were used for the split. An illustration of this process is shown in Fig. 2.

Implementation
In the implementation step, the validated prediction functions were used to characterize the generated media with randomly distributed uniform size pores. The pore centroids were randomly generated for both validation and implementation steps. ML models were not trained to predict the pore positions. In order to ensure the characterization, pore centroid proximities indicating the geometric likelihood and the corresponding nSIFs between the empirical data and the predictive results of the generated media were analyzed by means of the Euclidean bipartite matching as illustrated in Fig. 3. Thus, n pores Â n pores distance matrix was generated based on the Euclidean distances d between the pore centroid coordinates of the empirical data, denoted with p, and the generated pore coordinates, denoted with q, as Thereafter, the minimal cost flow problem was set for determination of the optimal permutation based on the total Euclidean distance T of the centroids as [36] T where P refers to the permutations for one-to-one correspondence. In order to solve the problem given in Eq. (7), Kuhn-Munkres algorithm was conducted to determine the non-overlapping path resulting in the minimum total Euclidean distance as seen in Fig. 3 [37].

Error quantification
We generated the prediction functions based on empirical nSIF data sets, where the specimens were subjected to uniaxial loading and biaxial loading. Monte Carlo simulations with the decisive parameter as MAPE and 10 repetitions resulted in 70-85% of the data sets to be used as the training data listed in detail in Appendix Tables 2 and 3. The prediction functions, thereafter, were used to predict nSIFs of the generated specimens and compared with the empirical data by using the Euclidean bipartite matching method as depicted for different cases in Fig. 4. As depicted in Fig. 5

Failure probability predictions
The Weibull plots showing the distribution of r f calculated from the FM-FEM simulations and predicted from the RF method are shown in Fig. 6c-d for uniaxial loading and in Fig. 7c, d for biaxial loading.   The results from ANN predictions are shown in Figs. 6 and 7e-f. The RF and ANN models captured the correct decrease in r f with increasing porosity for all the 42 sets of data under uniaxial loading and biaxial loading. Both of the RF and ANN models had limited prediction of the lower and upper tail deviations in r f distributions. These tail deviations can be seen as rare fracture instances that are even more challenging to be predicted by the ML models, as the rare events require larger sets of data for accurate prediction. The Weibull moduli for the validation data were the closest to the empirical Weibull moduli compared to the moduli estimated from the implementation data. The m values for the RF validation model were as low as 20, which are rarely observed in experiments. Chao and Shetty reported a Weibull modulus of 22 under biaxial loading and 24 under 4-point bending for alumina containing * 6 vol% porosity [38]. All the RF and ANN predictions of r f generated Weibull moduli were larger than the FM-FEM simulated r f data. More accurate predictions were obtained for the structures containing the lowest number of pores = 5. The changes in fracture strength variations with increasing porosity are    (Figs. 6, 7). These high Weibull moduli are not representative of the real systems with m lower than 20. Yet, the RF model qualitatively predicted the decreasing m values with increasing pore size under uniaxial loading and biaxial loading (Figs. 6, 7). ANN predicted r f data also showed similar trend in decreasing Weibull modulus with increasing pore size, but not for all the sets. Similar decrease in Weibull modulus with increasing pore size was also numerically predicted by lattice spring models [12].
The prediction accuracy of the RF and ANN model was higher for smaller pores at 40 lm radius. The origin of this observation was attributed to the lower probability of finding strongly interacting pores in structures with the same amount of smaller size pores, that is, lower porosity. The maximum porosity of the 40-lm-radius pores was * 10 vol%, whereas the largest pore system with 80-lm-radius pores had a porosity of * 31 vol% (Fig. 5). This high porosity introduces stress amplification and stress shielding based on the spatial distribution of pores [8,35]. As a result, the distribution of fracture strength is wider in systems with high porosity and high number of pores. Hence, the higher-porosity systems demand even larger data sets to deliver more accurate predictions.

Conclusions
An investigation of the stochastic fracture behavior of porous ceramics and glasses is challenging due to complex interactions between pores, pore and crack, and far field stresses. This complexity and requirement of large number of virtual tests prohibit the use of high-fidelity simulations to predict the reliability of brittle porous materials. In this context, we used the results of FM-FEM simulations on 680 unique porous structures to develop RF and ANN models. The RF model correctly captured the general decreasing trend of r f values and decreasing Weibull modulus with increasing porosity. The ANN model also predicted the decreasing r f with increasing porosity, but it was not accurate enough to predict the variations in r f quantified by the Weibull modulus. The RF model had relatively better prediction of the Weibull modulus and its change with porosity. The prediction accuracies of both RF and ANN were higher for the lowest porosity systems.
As an advancement in the predictive materials characterization, a geometric matching method by means of the Euclidean bipartite matching between the empirical and generated media was also presented to be able to quantify the error (here, MAPE was selected to be the error measure). The nSIFs were compared by means of MAPE which are in ranges of 3.4-17.93% (uniaxial load) and 2.83-19.42% (biaxial load) for RF, 3.79-17.43% and 3.39-21.43 for ANN at the validation step; 3.54-18.20% (uniaxial load) and 3.06-21.60% (biaxial load) for RF, 3.57-18.26% and 3.43-21.76% for ANN at the implementation step. The present study thus indicates that further empirical investigations with large number of microstructures ([ 10,000) are needed to validate the predictive capabilities of ML for the fracture mechanics problems.