[ Moved from MW HS 0337 ]
08:30
- CANCELED - A goodness of fit for exponential models over S^(p-1)
Nishan Mudalige | University of Guelph | Canada
Show details
Authors:
Nishan Mudalige | University of Guelph | Canada
Peter Kim | University of Guelph | Canada
The analysis of directional data has been of interest in the statistical community for a substantial period of time. Its origins can be traced back to Gauss who developed the theory of errors to analyze directional measurements in astronomy. Due to this interest in directional statistics, several probability distributions have been developed for directional data over the (p–1)-dimensional sphere S^(p-1). Beran developed a class of models for higher dimensional surfaces called the exponential model. The generalized spherical harmonic functions described by Fan and Dai can be associated into an exponential model to develop a probability distribution over S^(p–1). A well known distribution over S^2 is the 5-parameter Fisher-Bingham introduced by Kent and this distribution can be generalized for higher dimensions to form the generalized Kent distribution over S^(p–1).
In this talk, we explain the relationship between the parameters of the generalized Kent distribution and Beran’s exponential model with a generalized spherical harmonic basis and we introduce a test we developed for model fitting. We explain the complexity and challenges involved with describing the association between the two distributions and we introduce a measure for quantifying the error associated with the goodness of fit. We verify the integrity of our results with simulations using Beran’s regression estimator and a regularization technique we developed.
08:50
Correlated Bernoulli Processes Using De Bruijn Graphs
Louise Kimpton | University of Exeter | United Kingdom
Show details
Authors:
Louise Kimpton | University of Exeter | United Kingdom
Peter Challenor | University of Exeter | United Kingdom
Henry Wynn | London School of Economics | United Kingdom
Danny Williamson | University of Exeter | United Kingdom
Some numerical models have two distinct regions in output space where classification is required. E.g. a computer model may fail to complete for specific input regions, and we’d like to predict where to avoid running the model or incorrectly running an emulator. A widely used method for classification is logistic regression, which produces a distribution for the predictive class membership of being in either region. When sampling from this to make predictions, current practice is to draw from an independent Bernoulli distribution; drawing marginally means that any correlation between data is lost and can result in large numbers of misclassifications. If simulating chains or fields of 0’s and 1’s, it is hard to control the ‘stickiness’ of like symbols. In this paper, we present a novel approach to generating a correlated Bernoulli process to create chains of 0’s and 1’s, for which like symbols cluster together. We use the structure from de Bruijn Graphs - a directed graph, where given a set of symbols, V, and a ‘word’ length, m, the nodes of the graph consist of all possible sequences of V of length m. De Bruijn Graphs are a generalisation of Markov chains, where the ‘word’ length controls the number of states that each individual state is dependent on. This increases correlation over a wider area. Properties of De Bruijn graphs including expected run length and methods for inference will be presented, along with how this approach can be extended to higher dimensions.
09:10
Estimation of physical tolerance bounds for functional data
Nevin Martin | Sandia National Laboratories | United States
Show details
Authors:
Nevin Martin | Sandia National Laboratories | United States
Thomas Buchheit | Sandia National Laboratories | United States
J. Derek Tucker | Sandia National Laboratories | United States
Shahed Reza | Sandia National Laboratories | United States
Tolerance bounds are commonly used in engineering applications to bound a percentage of a population with a given level of confidence. There are well-established statistical methodologies for calculating tolerance bounds for scalar and multivariate data. However, data from both computer simulations and physical experiments are often functional in nature, where the response varies continuously across an independent variable. Functional data contain two types of variability that must be accounted for – amplitude (vertical) and phase (horizontal) – though traditional methods often overlook the latter. This work applies a recently developed functional data approach to first generate tolerance bounds on the amplitude and phase variabilities separately. A novel method is developed for transforming these bounds into the original data space, producing physical bounds that quantify both types of variability. This method is applied to electrical diode data that are used in the compact model development for electronic circuit design applications. This data, collected as current as a function of voltage, contain variability across both the current and voltage axes. We demonstrate that the functional data approach produces bounds that can account for both the amplitude and phase variabilities and produces superior tolerance bounds compared to a traditional method.
SNL is managed and operated by NTESS under DOE NNSA contract DE-NA0003525.
09:30
Practical implementation of aggregated tests of independence based on dependence measures
Anouar Meynaoui | INSA de Toulouse | France
Show details
Authors:
Amandine Marrel | CEA/DEN - Institut de Mathématiques de Toulouse, UMR5219 | France
Anouar Meynaoui | INSA de Toulouse | France
Beatrice Laurent | INSA de Toulouse – Institut de Mathématiques de Toulouse, UMR5219 | France
Mélisande Albert | INSA de Toulouse – Institut de Mathématiques de Toulouse, UMR5219 | France
In the framework of propagation of uncertainties in numerical simulation, global sensitivity analysis (GSA) aims at studying the impact of the input uncertainties on the output of the model. For this, dependence measures based on reproducing kernel Hilbert spaces (namely Hilbert-Schmidt Independence Criterion denoted HSIC), are very efficient statistical tools. HSIC measures can be used to screen the inputs via independence tests, performed individually between each input and the output.
To improve the quality and robustness of screening based on HSIC measures, we develop a new test procedure that aggregates several single tests based on HSIC measures with different kernels. This makes it possible to take into account several dependence scales simultaneously and capture a broader spectrum of dependence. To efficiently choose the collection of kernels in the aggregated test, we propose in practice to set aside a part of the initial sample to identify the most “relevant” choices, while the rest of the sample is kept to perform the aggregated tests. This procedure can be repeated for different random partitions of the initial sample. Finally, the inputs with a significant average rate of selection at the end of the tests are considered as influential.
The whole methodology is illustrated on a test case simulating a severe accidental scenario on nuclear reactor.
09:50
Studying Partial Dependence: An Ensemble Approach
Roman Hahn | Bocconi University | Italy
Show details
Authors:
Roman Hahn | Bocconi University | Italy
Emanuele Borgonovo | Bocconi University | Italy
Elmar Plischke | TU Clausthal | Germany
Laura Bondi | Bocconi University | Italy
Partial dependence functions are among the most popular tool for increasing interpretability of machine learning findings. For a partial dependence calculation the use of an emulator (Gaussian process regression, regression tree, support vector machine, etc..) that fits the data is essential. In this work, we explore how, using different metamodels may provide conflicting information to the analyst. The practical implication of our work is that care must be adopted by the analyst to avoid misleading inference. We propose the joint use of partial dependence functions together with other graphical representation tools (such as cusunoro curves) that provide the analyst/decision maker with complementary insights, making conclusions more robust.
10:10
A New Covariance Estimator for Sufficient Dimension Reduction in High-dimensional Data with Undersized Sample Problems
Kabir Olorede | Kwara State University | Nigeria
Show details
Author:
Kabir Olorede | Kwara State University | Nigeria
Application of standard sufficient dimension reduction methods for reducing the dimension space of predictors without losing regression information requires inverting the covariance matrix of the predictors. This has posed a number of challenges especially when analyzing high-dimensional datasets in which the number of predictors p is much larger than number of samples (n≪p). We propose a new covariance estimator, called the Maximum Entropy Covariance (MEC) that addresses loss of covariance information when similar covariance matrices are linearly combined using maximum entropy (ME) principle. By benefitting naturally from slicing or discretizing range of the response variable y into H non-overlapping categories, h_1,…,h_H, MEC first combines covariance matrices arising from samples in each y slice h∈H and then select the one that maximizes entropy under the principle of maximum uncertainty. The MEC estimator is then formed from convex mixture of such entropy-maximizing sample covariance estimate and pooled sample covariance estimate across the H slices without requiring time-consuming covariance optimization procedures. MEC deals directly with singularity and instability of sample group covariance estimate in both supervised regression and classification problems. Effectiveness of the MEC estimator is studied with existing sufficient dimension reduction methods as demonstrated on both classification and regression problems using real life data examples.
10:30
Non-parametric importance sampling for parameter estimation through MCMC
Yoonsang Lee | Dartmouth College | United States
Show details
Author:
Yoonsang Lee | Dartmouth College | United States
Sampling from a probability distribution, particularly of high-dimensional and non-Gaussian, is an essential computational method for Bayesian inference, numerical integration, etc. Importance sampling, among others, is a variance reduction method and shows many applications, including calculation of small probability events. We propose a non-parametric importance sampling method that requires only a point-wise evaluation of the distribution. Non-parametric generation of a proposal distribution has the flexibility and reduced variance in comparison with parametric importance sampling when the target distribution is highly non-Gaussian and complex. The proposed method uses many independent Markov chains from MCMC not necessarily discarding the burn-in period. This approach allows to use of short chains without dropping burn-in periods and thus decreases computation time to generate the non-parametric proposal distribution. We provide several numerical tests, including 9-dimensional parameter estimation in computational fluid dynamics to show the robustness and effectiveness of the proposed method.