Computational Inference: Computational methods for statistics and probabilistic machine learning.
Spring 2026. UT Austin, Department of Statistics and Data Sciences. TThu 9–10:30 am at GDC.
Course information
This course is a Ph.D. level core class on statistical computing methods. Computation plays a central role in modern statistics and machine learning. The goal of this class is to provide (1) knowledge of practical computational inference techniques, such that students who are interested in applied statistics can use these tools to efficiently fit realistic models, (2) exposure to the frontier of modern statistical computing, such that students who are interested in method or theory development can understand how and why existing methods work, and likely enter the research in this area.
This course will cover essential topics to develop a broad working knowledge of modern computational statistics. The selection of topics is based on our view of what is central to this evolving discipline and what will be both intellectually stimulating and practically relevant.
- Basic optimization (Newton-Raphson, quasi-Newton, EM method, stochastic gradient descent, Bayesian optimization).
- Monte Carlo methods (rejection sampling, importance sampling, quasi-Monte Carlo).
- MCMC methods (Gibbs sampling, Metropolis-Hastings, Sequential Monte Carlo, Hamiltonian Monte Carlo, MALA, NUTS).
- Approximate inference (Laplace, variational inference, expectation propagation).
- Evaluation and diagnostics of computing (convergence tests, validation, simulation-based check, post-processing).
- Miscellaneous tricks (reparametrization, tempering, variance reduction in optimization and sampling, data-augmentation, control variates)
- Score-based methods (Stein divergence, path sampling, score matching).
- Likelihood-free computation and related topics (ABC, neural posterior estimation, normalizing flows, diffusion models).
Prerequisites
If you are a student outside of the Statistics Ph.D. program, instructor permission is required to take this class. This course is designed to be the advanced course for first-year statistics Ph.D. students. Many students will already have (a) hands-on experience in applied modeling, (b) graduate-level knowledge of mathematical statistics and probability, (c) basic coding skills (R and/or Python), and (d) working knowledge of Bayesian inference.
I will not teach much programming, but will focus on overarching ideas and techniques. For the course project, if you want practical sampling, I recommend using a high-level probabilistic programming language such as Stan, Jax, PyMC, Turing, or Pangolin; if you want to implement a new algorithm, you may consider Jax/Pytorch, or R + BridgeStan if a you are an R user.
Slides
- Introcutoion
- Recap of Bayes
- Automatic Differentiation
- Stochastic Optimization
- Variational Inference
- Importance sampling
- MCMC
- diagnoistics
- Hamiltonian Monte Carlo
- Nested Laplace Approximation
- Expectation Propagation
- Computing Normalizing Constant
- Score-Based Methods
- Approximate Bayes Computation
- Normalizating Flows
- Simulation-Based Inference
- Diffusion model
Schedule
We hold two classes per week, labeled sequentially as 1a, 1b, 2a, and so on. I strongly encourage you to read the recommended papers listed prior to each class to maximize your understanding and engagement. Papers marked with ▹ cover advanced topics and are optional for reading.
- Class 1a 1/13. Introduction. Review of probabilistic modeling. Monte Carlo integration.
- Reading: none.
- Class 1b 1/15. Two starting points: Automatic differentiation and Monte Carlo integration.
- Reading:
- A review of automatic differentiation and its efficient implementation (Margossian, 2018)
- Rumble in the ensemble, blog post (Betancourt 2021)
- ▹ Monte Carlo gradient estimation in machine learning (Mohamed et al., 2019)
- Reading:
- Class 2a. Stochastic optimization.
- Reading:
- ▹ Optimization methods for large-scale machine learning (Bottou et al., 2016)
- ▹ Variance-reduced methods for machine learning (Gower et al., 2020)
- Reading:
- Class 2b. Application of optimization: Variational inference.
- Reading:
- Variational inference: A review for statisticians (Blei et al., 2017)
- Automatic differentiation variational inference (Kucukelbir et al., 2017)
- ▹ Pathfinder: Parallel quasi-Newton variational inference (Zhang et al., 2022)
- Reading:
- Class 3a 2/3. Introduction to sampling. Importance sampling.
- Reading:
- The sample size required in importance sampling (Chatterjee and Diaconis, 2017)
- Pareto smoothed importance sampling (Vehtari et al., 2015)
- Reading:
- Class 3b 2/5. Introduction to MCMC. Metropolis-Hastings. Gibbs.
- Reading:
- Probabilistic inference using Markov chain Monte Carlo methods (Neal, 1993)
- Practical Markov chain Monte Carlo (Geyer, 1992)
- ▹ Efficient Metropolis jumping rules (Gelman et al., 1996)
- ▹ Slice sampling (Neal, 2003)
- Reading:
- Class 5a 2/10. Hamiltonian Monte Carlo.
- Reading:
- MCMC using Hamiltonian dynamics (Neal, 2012)
- A conceptual introduction to Hamiltonian Monte Carlo (Betancourt, 2017)
- Reading:
- Class 5b 2/12. Hamiltonian Monte Carlo. Programming languages.
- Reading:
- The no-U-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo (Hoffman and Gelman, 2014)
- ▹ Past, present, and future of software for Bayesian inference (Štrumbelj et al., 2024)
- An introduction to Stan, blog post (Betancourt 2021)
- Reading:
-
Class 6a 2/17. Practical modeling and sampling in Stan.
- Class 6b 2/19. Adaptive importance sampling and sequential Monte Carlo.
- Reading:
- A tutorial on particle filtering and smoothing: Fifteen years later (Doucet et al., 2012)
- ▹ Safe and effective importance sampling (Owen and Zhou, 2000)
- Annealed importance sampling (Neal, 2001)
- ▹ Adaptive multiple importance sampling (Cornuet et al., 2012)
- ▹ Population Monte Carlo (Cappé et al., 2004)
- ▹ A tutorial on adaptive MCMC (Andrieu and Thoms, 2008)
- Reading:
- Class 7a 2/24. From adaptive importance sampling to bridge sampling to path sampling.
- Reading:
- Simulating normalizing constants: From importance sampling to bridge sampling to path sampling (Gelman and Meng, 1998)
- A tutorial on bridge sampling (Gronau et al., 2017)
- ▹ Estimating ratios of normalizing constants using linked importance sampling (Neal, 2005)
- Reading:
- Class 7b 2/26. Beyond VI: Laplace and expectation propagation.
- Reading:
- Expectation propagation as a way of life: A framework for Bayesian inference on partitioned data (Gelman et al., 2014)
- ▹ Bayesian computing with INLA: A review (Rue et al., 2017)
- Reading:
- Class 8a 3/3. Diagnostics and check I: Convergence monitoring.
- Reading:
- Inference from simulations and monitoring convergence (Gelman and Shirley, 2011)
- Convergence assessment techniques for Markov chain Monte Carlo (Brooks and Roberts, 1998)
- ▹ Rank-normalization, folding, and localization: An improved R-hat for assessing convergence of MCMC (Vehtari et al., 2021)
- Reading:
- Class 8b 3/5. Diagnostics and check II: Approximate inference.
- Reading:
- Validation of software for Bayesian models using posterior quantiles (Cook et al., 2006)
- Yes, but did it work? Evaluating variational inference (Yao et al., 2018)
- ▹ Discriminative calibration: Check Bayesian computation from simulations and flexible classifier (Yao and Domke, 2024)
- ▹ Covariances, Robustness, and Variational Bayes (Giordano et al., 2018)
- Reading:
- Class 9a 3/10. Diagnostics and check III: How to compute a divergence? (Wasserstein, MMD, KL, Stein)
- Reading:
- A kernelized Stein discrepancy for goodness-of-fit tests (Liu et al., 2016)
- Strictly proper scoring rules, prediction, and estimation (Gneiting and Raftery, 2007)
- ▹ Generalized sliced Wasserstein distances (Kolouri et al., 2019)
- ▹ A kernel two-sample test (Gretton et al., 2012)
- ▹ Estimating divergence functionals and the likelihood ratio by convex risk minimization (Nguyen et al., 2010)
- Reading:
- Class 9b 3/12. Approximate Bayesian computation.
- Reading:
- Approximating Bayes in the 21st century (Martin et al., 2024)
- Reading:
-
Class 10a 3/17. Spring break. No class.
-
Class 10b 3/19. Spring break. No class.
-
Class 11a 3/24. Review + 2-minute presentation of course project ideas.
-
Class 11b 3/26. Midterm.
- Class 12a 3/31. Inference in structured models.
- Reading:
- Amortized variational inference in simple hierarchical models (Agrawal and Domke, 2021)
- Hierarchical variational models (Ranganath et al., 2016)
- ▹ Importance weighting and variational inference (Domke and Sheldon, 2018)
- ▹ Hamiltonian Monte Carlo for hierarchical models (Betancourt and Girolami, 2015)
- Reading:
- Class 12b 4/2. An incomplete exploration some variants of vanilla MCMC.
- Reading:
- Annealing Markov chain Monte Carlo with applications to ancestral inference (Geyer and Thompson, 1995)
- ▹ Delayed rejection in reversible jump Metropolis–Hastings (Green and Mira, 2001)
- ▹ Langevin dynamics with constraints and computation of free energy differences (Lelievre et al., 2010)
- ▹ Riemann Manifold Langevin and Hamiltonian Monte Carlo Girolami and Calderhead, 2011
- ▹ The Barker proposal: Combining robustness and efficiency in gradient-based MCMC (Livingstone et al., 2019)
- ▹ Piecewise-Deterministic Markov chain Monte Carlo (Vanetti et al., 2018)
- ▹ An adaptive-MCMC scheme for setting trajectory lengths in Hamiltonian Monte Carlo (Hoffman et al., 2021)
- Reading:
- Class 13a 4/7. Variance reduction tricks (quasi Monte Carlo, control variates, Rao-Blackwellization).
- Reading:
- Control functionals for Monte Carlo integration (Oates et al., 2017)
- Rao-Blackwellization of sampling schemes (Casella and Robert, 1996)
- ▹ Zero variance Markov chain Monte Carlo for Bayesian estimators (Mira et al., 2013)
- ▹ Partition functions from Rao-Blackwellized tempered sampling (Carlson et al., 2016)
- ▹ Using large ensembles of control variates for variational inference (Geffner and Domke, 2018)
- ▹ Using supervised learning to improve Monte Carlo integral estimation (Tracey et al., 2011)
- ▹ Simulation-efficient shortest probability intervals (Liu et al., 2015)
- Reading:
- Class 13b 4/9. From precondition to reparameterization to normalizing flows.
- Reading:
- Normalizing flows: An introduction and review of current methods (Kobyzev et al., 2019)
- Density estimation using deep generative neural networks (Liu et al., 2021)
- ▹ Automatic reparameterization of probabilistic programs (Gorinova et al., 2018)
- ▹ Advances in black-box VI: Normalizing flows, importance weighting, and optimization (Agrawal et al., 2020)
- ▹ Quantifying the effectiveness of linear preconditioning in Markov chain Monte Carlo (Hird and Livingstone)
- Reading:
- Class 14a 4/14. Simulation-based inference, neural posterior estimation.
- Reading:
- The frontier of simulation-based inference (Cranmer et al., 2020)
- ▹ Simulation-based stacking (Yao et al., 2023)
- ▹ Amortized Bayesian Workflows With Neural Networks (Schmitt et al., 2024)
- Reading:
- Class 14b 4/16. Revisiting score-based methods.
- Reading:
- Estimation of non-normalized statistical models by score matching (Hyvärinen, 2005)
- Measuring sample quality with Stein’s method (Gorham and Mackey, 2015)
- ▹ Stein variational gradient descent: A general-purpose Bayesian inference algorithm (Liu and Wang, 2016)
- ▹ Stein points (Chen et al., 2018)
- ▹ Output assessment for Monte Carlo simulations via the score statistic (Fan et al., 2006)
- Reading:
- Class 15a 4/21. A very Brief introduction to diffusion models.
- Reading:
- ▹ Diffusion models: A comprehensive survey of methods and applications (Yang et al., 2022)
- ▹ Stochastic interpolants: A unifying framework for flows and diffusions (Albergo et al., 2023)
- Reading:
- Class 15b 4/23. Review.
- Reading:
- Grand challenges in Bayesian computation (Bhattacharya et al., 2024)
- ▹ Emerging directions in Bayesian computation (Winter et al., 2024)
- Reading:
- Class 16a 4/28. Last class. Course project presentation.
Homeworks
What’s next
Despite the topics covered in this course, several areas are not included: (a) advanced optimization methods, (b) numerical techniques, (c) the mathematical theory of MCMC, (d) discrete-space sampling (e.g., ising models and spin glasses), (e) transdimensional sampling (e.g., reversible jump, pseudo-marginal methods), and (f) an in-depth exploration of modern generative modeling. Many of these topics are likely to be valuable if you are interested in pursuing research in this field.
We also do not have time to cover most language-specific and model-specific considerations. These nuances typically arise in practice when working on applied modeling projects.