Uncertainty quantification is of paramount importance in many geospatial applications. For example, data from remote-sensing platforms are often incomplete, noisy, and indirect. Spatial statistical methods, often based on Gaussian-process models, can address many of these challenges, but a major challenge is the computational infeasibility for large datasets. This session will discuss recent developments in spatial statistics for scalable uncertainty quantification.
14:00
Gaussian-process approximations for big data
Matthias Katzfuss | Texas A&M University | United States
Show details
Author:
Matthias Katzfuss | Texas A&M University | United States
Gaussian processes (GPs) are popular, flexible, and interpretable probabilistic models for functions in areas such as machine learning, regression, and geospatial analysis. GPs naturally quantify uncertainty, but direct application of GPs is computationally infeasible for large datasets. We consider a framework for fast GP inference based on the so-called Vecchia approximation. Our framework contains many popular existing GP approximations as special cases. Representing the models by directed acyclic graphs, we determine the sparsity of the matrices necessary for inference, which leads to new insights regarding the computational properties. Based on these results, we propose novel Vecchia approaches for noisy, non-Gaussian, and massive data. We provide theoretical results, conduct numerical comparisons, and apply the methods to satellite data.
14:30
Uncertainty quantification of environmental processes from large spatial data
Andrew Zammit Mangion | University of Wollongong | Australia
Show details
Author:
Andrew Zammit Mangion | University of Wollongong | Australia
Uncertainty quantification (UQ) of environmental processes from remote sensing data is important for data assimilation and physical-model calibration. Yet, UQ in this context is extremely hard due to the multi-scale, nonstationary properties typically exhibited by such processes, and the irregular nature of missing data in remote sensing products. We propose facilitating UQ in this context by using a flexible, highly-parameterised, nonstationary, multiscale spatial model constructed through a superposition of spatial processes with decreasing spatial scale and increasing degree of nonstationarity. Computation is facilitated through the use of Gaussian Markov random fields and parallel Markov chain Monte Carlo based on graph colouring. The resulting model allows for both distributed computing and distributed data. Importantly, it provides opportunities for valid model and data scalability and yet is still able to borrow strength across large spatial scales. We compare our approach to state-of-the-art spatial modelling and prediction methods and we illustrate a two-scale version on a dataset of sea-surface temperature containing on the order of one million observations.
15:00
Mixed graphical-basis models for large nonstationary and multivariate spatial data problems
William Kleiber | University of Colorado | United States
Show details
Author:
William Kleiber | University of Colorado | United States
There is an emerging consensus in the spatial statistical literature that basis expansion models are flexible and useful to model large, nonstationary spatial datasets. Low rank models, approximate spectral decompositions, multiresolution representations, stochastic partial differential equations and empirical orthogonal functions all fall within this basic framework. In this talk we explore a graphical model representation for the stochastic coefficients relying on specification of the sparse precision matrix. Sparsity is encouraged in an L1-penalized likelihood framework. Estimation exploits a majorization-minimization approach. The result is a flexible nonstationary spatial model that is adaptable to very large datasets. The idea is readily extended to multivariate problems. Illustrations on statistical climatology datasets will be shown.
15:30
Scalable latent spatio-temporal non-Gaussian process models via Gaussian stochastic PDEs
Finn Lindgren | University of Edinburgh | United Kingdom
Show details
Author:
Finn Lindgren | University of Edinburgh | United Kingdom
Latent Gaussian process models are a highly versatile class of models
for spatio-temporal phenomena. Practical spatio-temporal parameter
estimation as well as infilling or prediction with appropriate
measures of uncertainty in data sparse situations can be
computationally challenging. A partial solution is to use methods for
numerical PDE solvers to both produce a problem formulation with
sparse matrices, as well as associated iterative solution methods.
This can be combined with non-linear combinations of latent processes
as well as non-Gaussian observations, to obtain deterministic
approximations to Bayesian posterior distributions to a fraction of
the cost of Markov chain Monte Carlo methods.