A key challenge associated with simulations and predictions of complex systems is to evaluate the quality of these datasets and the ability of the underlying model to reproduce physically relevant simulations. In statistics one way to quantitatively evaluate and rank models is statistical scoring. This is typically based on scalar metrics and takes as input verification data and output from the model to be evaluated. While evaluating model simulations or predictions, one aims to detect bias, trends, outliers, or correlation misspecification. Methods to evaluate the quality of unidimensional outputs are well established; however, issues remains related to score approximation and uncertainty. Additionally, the evaluation of multidimensional outputs or ensemble of outputs has been addressed in the literature relatively recently and remains challenging. We will discuss these challenges associated with evaluating unidimensional and multidimensional simulations or predictions.
16:30
Using scoring rules to solve stochastic inverse problems
Emil Constantinescu | Argonne National Laboratory | United States
Show details
Authors:
Emil Constantinescu | Argonne National Laboratory | United States
Julie Bessac | Argonne National Laboratory | United States
In this talk we discuss the use of scoring rules to solve inverse problems with stochastic terms such as processes driven by stochastic partial or ordinary differential equations (SPDEs or SODEs). Sample approximation of the solution of the SPDE/SODE, that is, the forward problem is given by a distribution represented numerically by an ensemble of simulations. We develop a strategy to solve such inverse problems by expressing the objective as finding the closest forward distribution that best explains the distribution of the observations in a certain metric. We propose to use metrics typically employed in forecast verification such as scoring rules.
17:00
Generating proper scoring rules for high-dimensional objects using summary statistics
Thordis Thorarinsdottir | Norwegian Computing Center | Norway
Show details
Author:
Thordis Thorarinsdottir | Norwegian Computing Center | Norway
Many natural systems such as wildfires, disease occurrences, plant and cellular systems, and animal colonies are observed as point patterns in time, space, or space and time. Modelling and predicting point processes is complicated by the varying high dimensionality of the data – if a point is added or removed from an observed point pattern, the dimensionality of the data set changes accordingly. Combined with intricate interactions between the data points, this results in many models having densities with intractable normalizing constants. As a consequence, standard approaches to inference and prediction evaluation are often not feasible. To deal with some of these challenges, we propose a class of proper scoring rules for evaluating spatial point process predictions based on summary statistics. These scoring rules rely on Monte Carlo approximations of an expectation and can therefore easily be evaluated for any point process model that can be simulated. We show examples of scoring rules that evaluate specific aspects of the process, such as its spatial distribution and tendency towards clustering, and demonstrate the framework in a case study.
17:30
Using wavelets to verify the scale structure of precipitation forecasts
Petra Friederichs | University of Bonn | Germany
Show details
Authors:
Sebastian Buschow | University of Bonn | Germany
Petra Friederichs | University of Bonn | Germany
In numerical model and forecast evaluation, one important question is whether the predicted rainfall variability is distributed correctly across a range of spatial scales. We apply the wavelet-based structure scores to numerical weather predictions and radar-derived observations. After tackling important practical concerns such as uncertain boundary conditions and missing data, the behaviour of the scores under realistic conditions is tested via selected case studies as well as statistical analysis across a large data set. Among the two tested wavelet scores,
the approach based on the so-called map of central scales emerges as a particularly convenient and useful tool: Summarizing the local spectrum at each pixel by its centre of mass results in a compact and informative
visualization of the entire wavelet analysis. The histogram of these scales leads to a structure score which is straightforward to interpret and insensitive to free parameters like wavelet choice and boundary conditions. Its judgement is largely the same as that of the alternative approach based on the spatial mean wavelet spectrum and broadly consistent with other, established structural scores.
18:00
Evaluating reliability of forecasting systems under serial correlation
Jochen Broecker | University of Reading | United Kingdom
Show details
Author:
Jochen Broecker | University of Reading | United Kingdom
A general problem in the statistical evaluation of forecast performance is the intertemporal correlation of the verification-forecast pairs. As an example consider the rank histogram, a popular tool to assess the reliability of ensemble forecasting systems (that is, whether the ensembles can in fact be regarded as sampled from the relevant conditional distributions). If the system is reliable, the ranks are uniformly distributed, but they are not independent, so standard goodness-of-fit tests cannot be applied. On the other hand, assuming the forecasting system is reliable, the forecasts should, per definition, provide information about the correlation between the verification and themselves. This information is typically sufficient to formulate tests for reliability and determine the asymptotic distribution of the test statistic under minimal extraneous assumptions. Stratified rank histograms are an example that will be presented in detail.