Reproducing kernel Hilbert spaces are ubiquitous in applied mathematics and statistics due to the tractability provided by the reproducing property. They commonly underpin the theoretical analysis of stochastic processes (including Gaussian processes) and have historically been used to construct a variety of numerical schemes for interpolation, integration or solving differential equations. Additionally, within the machine learning literature, the last decade has seen fruitful research into the use of kernels in statistical tests, statistical estimators and sampling methods.
In this two-part mini-symposium, we propose to explore these more recent works and highlight their relevance to uncertainty quantification. The first session will focus on the use of kernel-based probability metrics and statistical divergences to construct statistical estimators and hypothesis tests for high-dimensional models or models with intractable likelihoods. The second session will focus on applications of kernels to problems in Monte Carlo methods and approximation of probability measures.
16:30
DPPs: The Kernel Machine of Point Processes
Rémi Bardenet | CNRS & CRIStAL, Universite de Lille | France
Show details
Author:
Rémi Bardenet | CNRS & CRIStAL, Universite de Lille | France
Determinantal point processes (DPPs) are distributions over configurations of points that encode repulsiveness in a kernel function. Important statistical quantities associated to DPPs have geometric and algebraic interpretations, which makes them a fun object to study. Besides, since their formalization by Macchi in 1975 as models for fermions in particle physics, specific instances of DPPs have half-mysteriously appeared in fields such as probability, number theory, or statistical physics. More recently, their modelling power has been investigated by statisticians and machine learners.
After a quick introduction to determinantal point processes, I will discuss some recent statistical applications of DPPs. First, we used DPPs to sample nodes in numerical integration, resulting in Monte Carlo integration with fast convergence with respect to the number of integrand evaluations. Second, we used DPPs for variable selection in linear regression, with performance not much worse than PCA while preserving the interpretability of the resulting features. If time allows it, I will finally dwell on the characterization of the distribution of the zeros of spectrograms of white noise in signal processing.
17:00
Developments in Stein-Based Control Variates
Leah South | Lancaster University | United Kingdom
Show details
Author:
Leah South | Lancaster University | United Kingdom
Stein’s method has recently been used to generate control variates which can improve Monte Carlo estimators of expectations when the derivatives of the log target are available. The two most popular Stein-based variance reduction techniques are zero-variance control variates (ZV-CV, a parametric approach) and control functionals (CF, a non-parametric alternative). This talk will describe two recent developments in this area. The first method applies regularisation methods in ZV-CV to give reduced-variance estimators in high-dimensional Monte Carlo integration. A novel kernel-based method motivated by CF and by Sard's method for numerical integration will also be introduced. The use of Sard's approach ensures that our control functionals are exact on all polynomials up to a fixed degree in the Bernstein-von-Mises limit, so that the reduced variance estimator approximates the behaviour of a polynomially-exact (e.g. Gaussian) cubature method. The benefits of the proposed variance reduction techniques will be illustrated using several Bayesian inference examples.
This is work joint with Chris Oates, Toni Karvonen, Antonietta Mira, Chris Nemeth, Mark Girolami and Chris Drovandi.
17:30
On the Geometry of Stein Variational Gradient Descent
Nikolas Nüsken | Universität Potsdam | Germany
Show details
Author:
Nikolas Nüsken | Universität Potsdam | Germany
Uncertainty quantification as well as data assimilation require sampling or approximating high-dimensional probability distributions. The focus of this talk is on the recently introduced Stein variational gradient descent methodology, a class of algorithms that rely on iterated steepest descent steps with respect to a reproducing kernel Hilbert space norm. This construction leads to interacting particle systems, the mean-field limit of which is a gradient flow on the space of probability distributions equipped with a certain geometrical structure. We leverage this viewpoint to shed some light on the convergence properties of the algorithm, in particular addressing the problem of choosing a suitable positive definite kernel function. This is joint work with A. Duncan (Imperial College London) and L. Szpruch (University of Edinburgh).
18:00
- CANCELED - Kernel Distribution Embeddings: Universal Kernels, Characteristic Kernels and Kernel Metrics on Distributions
Carl-Johann Simon-Gabriel | ETH Zurich | Switzerland
Show details
Author:
Carl-Johann Simon-Gabriel | ETH Zurich | Switzerland
Kernel mean embeddings have become a popular tool in machine learning. They map probability measures to functions in a reproducing kernel Hilbert space. The distance between two mapped measures defines a semi-distance over the probability measures known as the maximum mean discrepancy (MMD). Its properties depend on the underlying kernel and have been linked to three fundamental concepts of the kernel literature: universal, characteristic and strictly positive definite kernels.
The goal of this talk is three-fold. First I will show that, modulo a slight extension of their usual definitions, the concepts of universal, characteristic and strictly positive definite kernels are equivalent. Second, I will characterize the set of kernels whose associated MMD-distance metrizes the weak convergence of probability measures. Third, I will show that kernel mean embeddings can be extended from probability measures to generalized measures called Schwartz-distributions and discuss a few properties of these distribution embeddings.