I earned my PhD in Mathematics in June 2019, from the University of Tours, in France.
It results from a collaboration between:
– the Institut Denis Poisson (IDP) – University of Tours – Tours,
– the Laboratoire de Probabilités Statistique et Modelisation (LPSM) – Sorbonne Université – Paris,
– the multinational electronics and semiconducteur manufacturer, STMicroelectronics.
– Dr Philippe Leduc – STMicroelectronics
Probability of failure, rare event simulation, Monte – Carlo methods – Splitting, Gaussian process regression – Kriging, convex order, risk measures, bounds on quantiles, Bayesian inference, design of experiments, SUR strategies, Lipschitz function.
Context and motivations
Let’s consider an industrial product and its manufacturing process. The latter consists of a succession of complex operations, that are difficult to maintain constant over time. As a result, there are numerical values of some design parameters of the product that could vary from one manufactured piece to another. Besides, some configurations lead to defective pieces, i.e. pieces that do not meet the specifications.
To test the final performance of the product – before the launch of its manufacturing process – most industrial companies use numerical simulation. The principle is the following. Given a design scenario as an input, a numerical simulation then consists of applying to it a computer code. The latter is complex enough to represent the different stages of the manufacturing process. Here, the output is a scalar from which it is determined if the product meets the specifications.
The main constraint of this approach is that each simulation is costly in terms of time and computer resources (several hours or even days are needed for a single simulation). It is therefore completely inconceivable to test all design scenarios and to measure performance of the product by, for example, calculating the ratio of defective products to the total number of design scenarios tested. That is why a probabilistic analysis of the risk failure of the product must therefore be put into practice. This is done by introducing a probability distribution, which corresponds to the probability of occurrence of each design scenario. This probability distribution is assumed to be known in practice.
Since each marginal of the random vector X corresponds to a variable design parameter and the function g to the computer code, the variable of interest is here the random variable Y. To measure the performance of the product, we are then interested in the probability of Y = g(X) exceeding the threshold T, i.e. the probability of failure of the product. Note that it could be a very small probability. However, since the computer code is too complex to have access to its analytical expression, the function g is unknown in practice (black-box function). This means in particular that the law of the variable Y is unknown.
In this context, the naive Monte-Carlo simulation method cannot be put into practice. Indeed, recall that the principle of this method is to simulate a random sampling according to the law of X and to estimate the probability of failure by the proportion of observations that lead to a failure. On the picture below for example, there are 100 points representing 100 configurations of the product parameters, based on their density (orange points). The value of g for each of these points is systematically below the threshold T (blue points). The resulting estimation of the probability of failure is therefore null, so that the failure risk is underestimating. For this example, 100 numerical simulations are not enough to provide a reliable estimation of the probability of failure by the naive Monte-Carlo method.
Under the constraint of a very limited number of calls to the computer code, i.e. experimental data, our goals to achieve are the following:
→ Building a robust estimator of the probability of failure.
→ Having access to the relevance of the result obtained, i.e. providing a measure of the uncertainty in the probability of failure estimation.
→ Implementing a method for determining the configurations of the product parameters that must be studied in priority by numerical simulation. Indeed, this makes it possible to optimize the costs related to the numerical simulation and to obtain the most informative experimental data as possible. This type of problem refers to the design of computer experiments.
Method 1. Our observations are the results of numerical simulation (blue points). Basic Bayesian principles of the so-called Kriging method (Gaussian process regression) are then applied to correctly design an estimator of the target probability of failure.
From a numerical point of view, the practical use of this estimator is unfortunately restricted. An alternative estimator is then considered, which is equivalent in term of bias. Our main contribution concerns the existence of a convex order inequality between these two estimators. We show that this inequality allows to compare their efficiency and to quantify the uncertainty on the estimation results they provide (95% confidence intervals). An iterative procedure for the construction of an optimal design of computer experiments, based on the principle of the Stepwise Uncertainty Reduction strategies, also results from this inequality (see below).
Method 2. The second method is an iterative procedure, particularly adapted to the case where the probability of failure is very small, i.e. the redoubt failure event is rare. The expensive computer code is represented by a function that is assumed to be Lipschitz continuous. At each iteration, we proceed to a dyadic division of the inputs space, in order to determine where the computer code must be evaluated. The Lipschitz hypothesis is then used to construct approximations, by default and by excess, of the probability of failure. We show that these approximations converge towards the true value with the number of iterations and the approximation error decreases with the number of calls to the computer code. In practice, the approximations are estimated using the Monte-Carlo method known as splitting method and particularly adapted for the estimation of very small probability.
Exemple in dimension 1
Exemple in dimension 2
The proposed methods are relatively simple to implement and the results they provide can be easily interpreted. We tested them on various examples, as well as on a real case from STMicroelectronics.