All Posts
Splines in PyMC3
- 06 May 2022
- Category: beginner
Often, the model we want to fit is not a perfect line between some \(x\) and \(y\). Instead, the parameters of the model are expected to vary over \(x\). There are multiple ways to handle this situation, one of which is to fit a spline. The spline is effectively multiple individual lines, each fit to a different section of \(x\), that are tied together at their boundaries, often called knots.
NBA Foul Analysis with Item Response Theory
- 17 April 2022
- Category: intermediate, tutorial
This tutorial shows an application of Bayesian Item Response Theory [Fox, 2010] to NBA basketball foul calls data using PyMC. Based on Austin Rochford’s blogpost NBA Foul Calls and Bayesian Item Response Theory.
Regression discontinuity design analysis
- 09 April 2022
- Category: beginner, explanation
Quasi experiments involve experimental interventions and quantitative measures. However, quasi-experiments do not involve random assignment of units (e.g. cells, people, companies, schools, states) to test or control groups. This inability to conduct random assignment poses problems when making causal claims as it makes it harder to argue that any difference between a control and test group are because of an intervention and not because of a confounding factor.
Gaussian Process for CO2 at Mauna Loa
- 09 April 2022
- Category: intermediate
This Gaussian Process (GP) example shows how to:
Gaussian Mixture Model
- 09 April 2022
- Category: beginner
A mixture model allows us to make inferences about the component contributors to a distribution of data. More specifically, a Gaussian Mixture Model allows us to make inferences about the means and standard deviations of a specified number of underlying component Gaussian distributions.
Model building and expansion for golf putting
- 02 April 2022
- Category: intermediate, how-to
This uses and closely follows the case study from Andrew Gelman, written in Stan. There are some new visualizations and we steered away from using improper priors, but much credit to him and to the Stan group for the wonderful case study and software.
How to wrap a JAX function for use in PyMC
This notebook uses libraries that are not PyMC dependencies and therefore need to be installed specifically to run this notebook. Open the dropdown below for extra guidance.
Factor analysis
Factor analysis is a widely used probabilistic model for identifying low-rank structure in multivariate data as encoded in latent variables. It is very closely related to principal components analysis, and differs only in the prior distributions assumed for these latent variables. It is also a good example of a linear Gaussian model as it can be described entirely as a linear transformation of underlying Gaussian variates. For a high-level view of how factor analysis relates to other models, you can check out this diagram originally published by Ghahramani and Roweis.
A Hierarchical model for Rugby prediction
- 19 March 2022
- Category: intermediate, how-to
In this example, we’re going to reproduce the first model described in Baio and Blangiardo [2010] using PyMC. Then show how to sample from the posterior predictive to simulate championship outcomes from the scored goals which are the modeled quantities.
Bayesian moderation analysis
- 09 March 2022
- Category: beginner
This notebook covers Bayesian moderation analysis. This is appropriate when we believe that one predictor variable (the moderator) may influence the linear relationship between another predictor variable and an outcome. Here we look at an example where we look at the relationship between hours of training and muscle mass, where it may be that age (the moderating variable) affects this relationship.
A Primer on Bayesian Methods for Multilevel Modeling
- 27 February 2022
- Category: intermediate
Hierarchical or multilevel modeling is a generalization of regression modeling. Multilevel models are regression models in which the constituent model parameters are given probability models. This implies that model parameters are allowed to vary by group. Observational units are often naturally clustered. Clustering induces dependence between observations, despite random sampling of clusters and random sampling within clusters.
Lasso regression with block updating
- 10 February 2022
- Category: beginner
Sometimes, it is very useful to update a set of parameters together. For example, variables that are highly correlated are often good to update together. In PyMC block updating is simple. This will be demonstrated using the parameter step
of pymc.sample
.
Binomial regression
- 09 February 2022
- Category: beginner
This notebook covers the logic behind Binomial regression, a specific instance of Generalized Linear Modelling. The example is kept very simple, with a single predictor variable.
Bayesian mediation analysis
- 09 February 2022
- Category: beginner
This notebook covers Bayesian mediation analysis. This is useful when we want to explore possible mediating pathways between a predictor and an outcome variable.
Bayesian regression with truncated or censored data
- 09 January 2022
- Category: beginner
The notebook provides an example of how to conduct linear regression when your outcome variable is either censored or truncated.
GLM: Model Selection
- 08 January 2022
- Category: intermediate
A fairly minimal reproducable example of Model Selection using WAIC, and LOO as currently implemented in PyMC3.
Dirichlet mixtures of multinomials
- 08 January 2022
- Category: advanced
This example notebook demonstrates the use of a Dirichlet mixture of multinomials (a.k.a Dirichlet-multinomial or DM) to model categorical count data. Models like this one are important in a variety of areas, including natural language processing, ecology, bioinformatics, and more.
Bayesian Estimation Supersedes the T-Test
- 07 January 2022
- Category: beginner
Non-consecutive header level increase; H1 to H3 [myst.header]
Bayesian Additive Regression Trees: Introduction
- 21 December 2021
- Category: intermediate, explanation
Bayesian additive regression trees (BART) is a non-parametric regression approach. If we have some covariates \(X\) and we want to use them to model \(Y\), a BART model (omitting the priors) can be represented as:
Using shared variables (Data container adaptation)
- 16 December 2021
- Category: beginner
The pymc.Data
container class wraps the theano shared variable class and lets the model be aware of its inputs and outputs. This allows one to change the value of an observed variable to predict or refit on new data. All variables of this class must be declared inside a model context and specify a name for them.
Using a “black box” likelihood function (numpy)
- 16 December 2021
- Category: beginner
This notebook in part of a set of two twin notebooks that perform the exact same task, this one uses numpy whereas this other one uses Cython
GLM: Robust Regression using Custom Likelihood for Outlier Classification
- 17 November 2021
- Category: intermediate
Using PyMC3 for Robust Regression with Outlier Detection using the Hogg 2010 Signal vs Noise method.
Hierarchical Binomial Model: Rat Tumor Example
- 11 November 2021
- Category: intermediate
This short tutorial demonstrates how to use PyMC3 to do inference for the rat tumour example found in chapter 5 of Bayesian Data Analysis 3rd Edition [Gelman et al., 2013]. Readers should already be familliar with the PyMC3 API.
Estimating parameters of a distribution from awkwardly binned data
- 23 October 2021
- Category: intermediate
Let us say that we are interested in inferring the properties of a population. This could be anything from the distribution of age, or income, or body mass index, or a whole range of different possible measures. In completing this task, we might often come across the situation where we have multiple datasets, each of which can inform our beliefs about the overall population.
Variational Inference: Bayesian Neural Networks
- 20 October 2021
- Category: intermediate
There are currently three big trends in machine learning: Probabilistic Programming, Deep Learning and “Big Data”. Inside of PP, a lot of innovation is in making things scale using Variational Inference. In this blog post, I will show how to use Variational Inference in PyMC3 to fit a simple Bayesian Neural Network. I will also discuss how bridging Probabilistic Programming and Deep Learning can open up very interesting avenues to explore in future research.
Sequential Monte Carlo
- 19 October 2021
- Category: beginner
Sampling from distributions with multiple peaks with standard MCMC methods can be difficult, if not impossible, as the Markov chain often gets stuck in either of the minima. A Sequential Monte Carlo sampler (SMC) is a way to ameliorate this problem.
Hierarchical Partial Pooling
- 07 October 2021
- Category: intermediate
Suppose you are tasked with estimating baseball batting skills for several players. One such performance metric is batting average. Since players play a different number of games and bat in different positions in the order, each player has a different number of at-bats. However, you want to estimate the skill of all players, including those with a relatively small number of batting opportunities.
Multivariate Gaussian Random Walk
- 25 September 2021
- Category: beginner
This notebook shows how to fit a correlated time series using multivariate Gaussian random walks (GRWs). In particular, we perform a Bayesian regression of the time series data against a model dependent on GRWs.
GLM: Mini-batch ADVI on hierarchical regression model
- 23 September 2021
- Category: intermediate
Unlike Gaussian mixture models, (hierarchical) regression models have independent variables. These variables affect the likelihood function, but are not random variables. When using mini-batch, we should take care of that.
Probabilistic Matrix Factorization for Making Personalized Recommendations
- 20 September 2021
- Category: intermediate
So you are browsing for something to watch on Netflix and just not liking the suggestions. You just know you can do better. All you need to do is collect some ratings data from yourself and friends and build a recommendation algorithm. This notebook will guide you in doing just that!
Marginalized Gaussian Mixture Model
- 18 September 2021
- Category: intermediate
Gaussian mixtures are a flexible class of models for data that exhibits subpopulation heterogeneity. A toy example of such a data set is shown below.
Dirichlet process mixtures for density estimation
- 16 September 2021
- Category: advanced
The Dirichlet process is a flexible probability distribution over the space of distributions. Most generally, a probability distribution, \(P\), on a set \(\Omega\) is a [measure](https://en.wikipedia.org/wiki/Measure_(mathematics%29) that assigns measure one to the entire space (\(P(\Omega) = 1\)). A Dirichlet process \(P \sim \textrm{DP}(\alpha, P_0)\) is a measure that has the property that, for every finite disjoint partition \(S_1, \ldots, S_n\) of \(\Omega\),
Rolling Regression
- 15 September 2021
- Category: intermediate
Pairs trading is a famous technique in algorithmic trading that plays two stocks against each other.