In the last decades there has been renewed interest for Gaussian processes (GP) in statistics and machine learning. New challenges have arisen, especially in uncertainty quantification and optimization for complex systems. The case of continuous inputs has been intensively studied, and can be addressed with existing classes of GPs, such as isotropic (radial) kernels defined with the Euclidean distance. However, numerous applications involve more general non-Euclidean input spaces. This requires the definition of other GPs.
Fortunately, despite the diversity of situations, there are a few common techniques to define valid GPs, such as using a mapping to an Euclidean space. This mini-symposium aims at illustrating the variety of problems encountered along with their specific solutions, as well as the generic techniques. The first part, will focus on the case of discrete inputs in Gaussian process meta-modeling. By discrete input, we mean an input which has a finite number of levels, either ordered or not (it may also be called here “qualitative”, “categorical” or “factor” input). The second part, will present four other cases where the input space can be a permutation, time-varying, a probability distribution or a graph.
14:00
An overview of Gaussian process metamodels with discrete inputs
Olivier Roustant | INSA Toulouse | France
Show details
Author:
Olivier Roustant | INSA Toulouse | France
As an introduction to the mini-symposium, this talk will present an overview of GP metamodels dealing with discrete inputs. We will present some basics for GPs with discrete inputs, defined by positive semidefinite matrices, as well as recent developments dealing with a large number of levels or latent continuous variables. We will discuss the case where a discrete input is nominal or ordinal. We will also show connexions between kernels on discrete inputs, including mappings to continuous inputs. The talk will be illustrated by examples on toy problems and applications coming from the literature in computer experiments.
14:30
Latent variable Bayesian optimization for qualitative and quantitative inputs
Jhouben Cuesta Ramirez | CEA LETI & Mines Saint-Etienne | France
Show details
Authors:
Jhouben Cuesta Ramirez | CEA LETI & Mines Saint-Etienne | France
Olivier Roustant | INSA Toulouse | France
Alain Glière | CEA LETI | France
Rodolphe Le Riche | Mines Saint-Etienne | France
Cedric Durantin | CEA DAM | France
Guillaume Perrin | CEA DAM | France
The goal of Bayesian optimization (BO) is to find the global optima x* of a costly function F. BO proceeds sequentially by training a surrogate model that emulates F, and propose candidates to x* based on an infill criterion. In the literature, there are a lot of works related to continuous inputs. However many real life applications involve qualitative inputs as well. This makes BO harder, since both surrogate construction and infill criterion maximization must then be done in a non-Euclidean space with both qualitative and quantitative inputs. To address this difficulty, we propose to use a latent variable mapping to a Euclidean space of continuous variables. The intuition for introducing latent variables is that qualitative variables are often explained by unobserved quantitative ones. This new algorithm has 3 main steps: 1. Mapping estimation, 2. BO in the (continuous) image space, 3. pre-image point recovery (choosing the closest admissible point to the one found in step 2). We perform numerical tests on toy functions and applications, using random forest and Gaussian process as surrogates, and expected improvement as infill criterion. Results show that the proposed algorithm exhibit a similar behavior in our mixed setting than standard BO for continuous inputs: it outperforms a brute-force maximization of the costly function, when the budget devoted to function evaluations is small.
15:00
Sensitivity analysis with both continuous and categorical inputs using FANOVA graphs
Sonja Kuhnt | Dortmund University of Applied Sciences and Arts | Germany
Show details
Authors:
Sonja Kuhnt | Dortmund University of Applied Sciences and Arts | Germany
Dominik Kirchhoff | Dortmund University of Applied Sciences and Arts | Germany
Sensitivity analysis aims at identifying the input variables of a given model or function that have the highest impact on an output of interest. Global sensitivity analysis has broad applications in screening, interpretation and reliability analysis. First-order Sobol indices and closed Sobol indices quantify the single influence of variables and groups of variables, respectively. We consider the so-called total interaction index (TII), which measures the influence of a pair of variables together with all its interactions. It provides a deeper insight into the interaction structure of the unknown function, displayed in a so-called FANOVA graph. We analyse the use of TII and FANOVA graphs in the context of mixed continuous and categorical input variables and illustrate their behaviour with the aid of simple examples. A main field of application are Gaussian process emulations of computer experiments. We consider Gaussian process models where the kernel function splits into multiplicative functions for the continuous inputs and the categorical inputs. We show results of an application from the field of logistics.
15:30
Deepening analysis of uncertain categorical inputs using Gaussian processes - application to marine flooding
Jeremy Rohmer | BRGM | France
Show details
Authors:
Jeremy Rohmer | BRGM | France
Deborah Idier | BRGM | France
Sophie Lecacheux | BRGM | France
Rodrigo Pedreros | BRGM | France
Marine flooding assessment relies on the use of high-resolution hydrodynamic numerical models to understand the relationships between inputs (e.g. offshore sea conditions) and the output variables (e.g. flood spatial extent), and to characterize the uncertainty. The input-output mapping is, however, hindered by two aspects: 1. the high computation time cost of the numerical model limits the number of model runs (usually a few 100s); 2. some of the input variables are often categorical in nature, because different modelling assumptions are equally appropriate. In this communication, we propose to explore how recent advances in Gaussian Processes with mixed continuous/categorical inputs can bring valuable insights into the influence of such categorical inputs. Depending on how they are modelled (nominal, ordinal) and integrated in the kernel (level-related heteroscedasticity, compound symmetry, group structure and interactions, etc), different viewpoints can be brought. Three application cases with gradual complexity are investigated: i. nominal input to analyse the scenarios of rupture geometry for tsunamis; ii. ordinal variable to reveal the latent influence of the angle of approach for cyclone tracks; iii. group kernels to reveal the interactions between artificial and natural coastal defences’ failure scenarios for storm tides. Finally, the problem of selecting the most appropriate kernel structure using exploratory analysis and cross-validation procedures is discussed.