Data driven discovery is the modern trend of science. A plethora of developed models are dedicated in analyzing or assimilating data arising from problems in material science and chemistry to national defense and health. The proposed mini-symposium will focus on the uncertainty of data, and its speakers will discuss techniques of uncertainty quantification, parameter estimation and noise in complex data so that robust, reproducible and convergent results are propagated. By the same token, audience and speakers will benefit from a dynamic set of prominent and auspicious speakers with heterogeneous backgrounds spanning almost the entire spectrum of mathematical sciences, from topology and geometry to statistics and machine learning.
14:00
An Ultra-fast procedure for Learning Discontinuous Functions
Clement Etienam | University of Manchester | United Kingdom
Show details
Authors:
Clement Etienam | University of Manchester | United Kingdom
Kody Law | University of Manchester | United Kingdom
This work presents a method for solving the supervised learning problem in which the output is highly irregular, nonlinear and discontinuous. It is proposed to solve this problem in three steps: (i) cluster the pairs of input-output data points, resulting in a label for each point; (ii) classify the data, where the corresponding label is the output; and finally (iii) perform one separate regression for each class, where the training data corresponds to the subset of the original input-output pairs which have that label according to the classifier. It has not yet been proposed to combine these 3 fundamental building blocks of machine learning in this simple and powerful fashion. This can be viewed as a form of deep learning, where any of the intermediate layers can itself be deep. The utility and robustness of the methodology is illustrated on some toy problems, including one example problem arising from simulation of plasma fusion in a tokamak. We conclude that our method is capable of recovering the underlining unknown function in a quick and elegant manner.
14:30
A Bayesian Nonparametric Model for Longitudinal Data in Sport
Alessandro Lanteri | University of Turin | Italy
Show details
Author:
Alessandro Lanteri | University of Turin | Italy
In sport analytics, there is often interest in predicting elite athlete’s performance at a future sporting event given his/her competitive results tracked throughout the athlete's career and other (time-varying) covariates. Such predictions can be useful both for scouting purposes, and to build red flag indicators of unexpected increases in athlete performance for targeted anti-doping testing. We propose a predictive model for the longitudinal trajectory of athlete’s performance where we characterize the curve with a sparse basis expansion allowing individual time-dependent covariates to impact the shape of the estimated trajectories. Moreover, we introduce random intercepts, distributed according to a nonparametric hierarchical process, in order to induce clustering while borrowing statistical information across curves. In particular, we assume a hierarchical normalized generalized gamma process to grants great flexibility in clustering and accuracy in prediction. We apply our model to a longitudinal study on shot put athletes, where their competitive results are tracked throughout their career.
15:00
Wavelet invariants for statistically robust multi-reference alignment
Anna Little | Michigan State University | United States
Show details
Author:
Anna Little | Michigan State University | United States
This talk discusses a nonlinear, wavelet based signal representation that is translation invariant and robust to both additive noise and random dilations. Motivated by the multi-reference alignment problem and generalizations thereof, the statistical properties of this representation are analyzed given a large number of independent corruptions of a target signal. The nonlinear wavelet based representation uniquely defines the power spectrum but allows for an unbiasing procedure that cannot be directly applied to the power spectrum. After unbiasing the representation to remove the effects of the additive noise and random dilations, we recover an approximation of the power spectrum by solving a convex optimization problem, and thus obtain the target signal up to an unknown phase. Extensive numerical experiments demonstrate the statistical robustness of this approximation procedure.
15:30
Diffusion Geometric Approaches to Active Learning
James Murphy | Tufts University | United States
Show details
Author:
James Murphy | Tufts University | United States
Active learning is the semisupervised machine learning regime in which an algorithm has access to large quantities of unlabeled data, which may be queried for labels. If queried in an intelligent manner, the labeled data produced by an active learning algorithm can be much more potent than a random sample, and the consequent classification algorithm can perform as well as traditional supervised learning, but with far fewer labels. We propose to use the underlying \emph{diffusion geometry} of the data to determine points to query for labels. The approach is fast and enjoys theoretical guarantees on the number of queries needed for a high-accuracy labeling of the entire dataset. In particular, the low-dimensional geometric properties of the data quantify the expected impact of a given label on the overall labeling scheme. The application of this approach to real high-dimensional images yields state-of-the-art classification results.