Deep learning techniques are becoming the center of attention across many scientific disciplines. Many predictive tasks are currently being tackled using over-parameterized, black-box discriminative models such as deep neural networks, in which interpretability and robustness is often sacrificed in favor of flexibility in representation and scalability in computation. Such models have yielded remarkable results in data-rich domains, yet their effectiveness in data-scarce and risk-sensitive tasks still remains questionable, primarily due to open challenges in statistical inference and uncertainty quantification. This mini-symposium invites contributions on uncertainty quantification methods for deep learning and their application in the physical and engineering sciences. Topics include (but are not limited to) Bayesian neural networks, deep generative models, posterior inference techniques, and applications to forward/inverse problems, active learning, Bayesian optimization and reinforcement learning.
08:30
Subspace Inference for Bayesian Deep Learning
Andrew Gordon Wilson | NYU Courant | United States
Show details
Authors:
Andrew Gordon Wilson | NYU Courant | United States
Wesley J. Maddox | Cornell University | United States
Pavel Izmailov | NYU Courant | United States
Polina Kirichenko | NYU Courant | United States
Timur Garipov | MIT | United States
Dmitry Vertrov | Samsung AI Center Moscow | Russian Federation
Bayesian inference was once a gold standard for learning with neural networks, providing accurate full predictive distributions and well calibrated uncertainty. However, scaling Bayesian inference techniques to deep neural networks is challenging due to the high dimensionality of the parameter space. In this talk, we discuss how to construct low-dimensional subspaces of parameter space, such as the first principal components of the stochastic gradient descent (SGD) trajectory, which contain diverse sets of high performing models. In these subspaces, we are able to apply elliptical slice sampling and variational inference, which struggle in the full parameter space. We show that Bayesian model averaging over the induced posterior in these subspaces produces accurate predictions and well calibrated predictive uncertainty for both regression and image classification.
09:00
Applying Bayesian Principles to Deep Learning: Scaling, Uncertainty Calibration, and Continual Learning
Siddharth Swaroop | University of Cambridge | United Kingdom
Show details
Authors:
Siddharth Swaroop | University of Cambridge | United Kingdom
Richard Turner | University of Cambridge | United Kingdom
Deep-learning methods have brought success to many fields, such as
computer vision, natural language processing, and speech processing.
However, their lack of reliable uncertainty estimates hinder their
application to other fields, such as high-risk domains (e.g. medical
applications), data-scarce applications, and applications such as
continual learning, where there is forgetting of old knowledge. One
theoretical way to obtain good uncertainty estimates is to apply
Bayesian principles. However, intractabilities mean that approximations
to Bayes' equation are required. I shall focus on how we can use the
variational inference approximation to train Bayesian Neural Networks
(BNNs). I will describe recent work that uses natural-gradient
variational inference to scale BNNs to larger data settings than
previously achievable, crucially showing improved uncertainty estimates.
One application I shall focus on is continual learning. Here, new data
is seen over time in an online fashion, old data is not allowed to be
revisited, and data distributions may change with time. Naive
uncertainty-unaware methods tend to catastrophically forget old
knowledge when training on new data. But maintaining good uncertainty
estimates via Bayesian principles is one natural way to approach
continual learning, and achieves excellent results on benchmarks. These
advances in training and applying BNNs unlock exciting
uncertainty-dependent applications for deep-learning techniques.
09:30
Analyzing the Vulnerability of Deep Classifiers with Deep Generative Models
Yang Song | Stanford University | United States
Show details
Authors:
Yang Song | Stanford University | United States
Stefano Ermon | Stanford University | United States
Nate Kushman | Microsoft Research | United States
Classifiers based on deep learning techniques have surpassed human performance on many tasks. However, they can easily give wrong predictions with high confidence scores on adversarially manipulated inputs, leading to a security vulnerability. To analyze this, we propose to leverage generative models to study the distribution of inputs where the classifier gives incorrect predictions with incorrect uncertainty scores. We first focus on perturbation-based adversarial examples, a special kind of adversarial inputs crafted by perturbing normal data points with small noise almost imperceptible to humans. By training a density model on clean data, we show that although the perturbations are small, they lead to significant decrease of likelihoods as measured by the model, and can therefore be detected effectively. Based on this observation, we create PixelDefend, a method to purify adversarial inputs by perturbing them to increase the likelihoods. This method uniformly improves the robustness of networks across existing attacking methods without the need of re-training classifiers. Next, leveraging generative models, we disclose a new threat to deep classifiers called unrestricted adversarial examples, where the attackers are not restricted to do small perturbations and are thus harder to be defended against. Using conditional generative models, we show that it is easy to massively produce realistic-looking inputs which fool the classifier to give different predictions from humans.
10:00
Model Reduction for Input-Output Maps
Nikola Kovachki | California Institute of Technology | United States
Show details
Authors:
Nikola Kovachki | California Institute of Technology | United States
Andrew M. Stuart | California Institute of Technology | United States
We develop a general framework for a data-driven approximation to input-output maps between infinite dimensional spaces by utilizing the recent success of deep learning. For a class of such maps and a suitably chosen probability measure on the inputs, we prove generalization bounds and convergence of our approximation. Numerically, we demonstrate the effectiveness of our method on parametric PDE problems, showing robustness to size of the discretization.