I earned my PhD in Mathematics in June 2019, from the **University of Tours,** in France.

It results from a **collaboration** between:

– the Institut Denis Poisson (IDP) – University of Tours – Tours*,***
**– the Laboratoire de Probabilités Statistique et Modelisation (LPSM) – Sorbonne Université – Paris,

– the multinational electronics and semiconducteur manufacturer, STMicroelectronics.

**Thesis directors**

– Pr Arnaud Guyader – LPSM

– Pr Florent Malrieu – IDP

**Industrial supervisor**

– Dr Philippe Leduc – STMicroelectronics

**Keywords**

Probability of failure, rare event simulation, Monte – Carlo methods – Splitting, Gaussian process regression – Kriging, convex order, risk measures, bounds on quantiles, Bayesian inference, design of experiments, SUR strategies, Lipschitz function.

Context and motivations

Let’s consider an industrial product and its **manufacturing process**. The latter consists of a succession of **complex operations**, that are difficult to maintain constant over time. As a result, there are numerical values of some **design parameters** of the product that could vary from one manufactured piece to another. Besides, some configurations lead to **defective pieces**, i.e. **pieces that do not meet the specifications**.

To **test the final performance of the product** – before the launch of its manufacturing process – most industrial companies use numerical simulation. The principle is the following. Given a **design scenario** as an input, a numerical simulation then consists of applying to it a **computer code**. The latter is **complex** enough to represent the different stages of the manufacturing process. Here, the output is a **scalar** from which it is determined **if the** **product** **meets the specifications**.

Problematic

The main constraint of this approach is that each simulation is costly in terms of **time** and **computer resources **(several hours or even days are needed for a single simulation). It is therefore completely inconceivable to test all design scenarios and to measure performance of the product by, for example, calculating the ratio of defective products to the total number of design scenarios tested. That is why a probabilistic analysis** of the risk failure of the product** must therefore be put into practice. This is done by introducing a **probability distribution**, which corresponds to the probability of occurrence of each design scenario. This probability distribution is assumed to be known in practice.

Since each marginal of the random vector **X **corresponds to a variable design parameter and the function g to the computer code, the variable of interest is here the random variable *Y*. To measure the performance of the product, we are then interested in **the **probability **of ***Y** = g*(**X**)** exceeding the threshold ***T***, **i.e. the **probability of failure** of the product. Note that it could be a **very small probability**. However, since the computer code is too complex to have access to its analytical expression, the function *g* is **unknown** in practice (black-box function). This means in particular that **the law of the variable Y is unknown**.

**
**Standard approach

In this context, the naive Monte-Carlo

**simulation method**cannot be put into practice. Indeed, recall that the principle of this method is to

**simulate a random sa**

**mpling**according to the law of

**X**and to estimate the probability of failure by the proportion of observations that lead to a failure. On the picture below for example, there are 100 points representing 100 configurations of the product parameters, based on their density (orange points). The value of

*g*for each of these points is systematically

**below the threshold**

*T*(blue points). The resulting

**estimation of the probability of failure**is therefore

**null**, so that the

**failure risk is**

**underestimating**. For this example, 100 numerical simulations are not enough to provide a reliable estimation of the probability of failure by the naive Monte-Carlo method.

**
**Objectives

**Under the constraint of a very limited number of calls to the computer code**, i.e. experimental data, our

**goals**to achieve are the following:

**→** Building a robust estimator of the probability of failure.

** → **Having access to the **relevance of the result obtained**, i.e. providing a measure of the uncertainty in the probability of failure estimation.

** → **Implementing a method for determining **the configurations of the product parameters that must be studied in priority by numerical simulation**. Indeed, this makes it possible to optimize the costs related to the numerical simulation and to obtain the most informative experimental data as possible. This type of problem refers to the design of computer experiments.

Proposed approaches

**Method 1.** Our observations are the results of numerical simulation (blue points). Basic **Bayesian principles** of the so-called Kriging method (Gaussian process regression) are then applied to correctly design an **estimator of the target probability of failure**.

From a numerical point of view, the practical use of this estimator is unfortunately restricted. An **alternative estimator** is then considered, which is equivalent in term of bias. Our main contribution concerns the existence of a convex order inequality between these two estimators. We show that this inequality allows **to compare their efficiency** and **to quantify the uncertainty on the estimation results they provide **(95% confidence intervals). An iterative** procedure** for the construction of an optimal design of computer experiments, based on the principle of the** Stepwise Uncertainty Reduction strategies**, also results from this inequality (see below).

**
Method 2.** The second method is an

**iterative procedure**, particularly adapted to the case where the probability of failure is very small, i.e. the redoubt failure event is rare. The expensive computer code is represented by

**a function that is assumed to be Lipschitz continuous**. At each iteration, we proceed to a dyadic division of the inputs space, in order to determine where the computer code must be evaluated. The Lipschitz hypothesis is then used to construct approximations,

**by default and by excess**, of the probability of failure. We show that

**these approximations**converge

**towards the true value with the number of iterations**and the

**approximation error**decreases with the number of calls to the computer code. In practice,

**the approximations are estimated using the Monte-Carlo method**known as splitting method and particularly adapted for the estimation of very small probability.

**Exemple in dimension 1
**

Exemple in dimension 2

The proposed methods are relatively **simple to implement** and the results they provide can be easily interpreted. We tested them on various examples, as well as on a **real case** from STMicroelectronics.