A glossary of common terms used throughout the PyMC documentation and examples.

Bayes’ theorem#

Describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For example, if the risk of developing health problems is known to increase with age, Bayes’ theorem allows the risk to an individual of a known age to be assessed more accurately (by conditioning it on their age) than simply assuming that the individual is typical of the population as a whole. Formula:

\[ P(A|B) = \frac{P(B|A) P(A)}{P(B)} \]

Where \(A\) and \(B\) are events and \(P(B) \neq 0\)

Bayesian inference#

Once we have defined the statistical model, Bayesian inference processes the data and model to produce a posterior distribution. That is a joint distribution of all parameters in the model. This distribution is used to represent plausibility, and is the logical consequence of the model and data.

Bayesian model#

A Bayesian model is a composite of variables and distributional definitions for these variables. Bayesian models have two defining characteristics: i) Unknown quantities are described using probability distributions and ii) Bayes’ theorem is used to update the values of the parameters conditioned on the data

Bayesian Workflow#

The Bayesian workflow involves all the steps needed for model building. This includes Bayesian inference but also other tasks such as i) diagnoses of the quality of the inference, ii) model criticism, including evaluations of both model assumptions and model predictions, iii) comparison of models, not


A form of statistical inference used to forecast an uncertain future event


Choosing which function or method implementation to use based on the type of the input variables (usually just the first variable). For some examples, see Python’s documentation for the singledispatch decorator.


In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed

Functional Programming#

Functional programming is a programming style that prefers the use of basic functions with explicit and distinct inputs and outputs. This contrasts with functions or methods that depend on variables that are not explicitly passed as an input (such as accessing self.variable inside a method) or that alter the inputs or other state variables in-place, instead of returning new distinct variables as outputs.

Generalized Linear Model#

In a Generalized Linear Model (GLM), we assume the response variable \(y_i\) to follow an exponential family distribution with mean \(\mu_i\), which is assumed to be some (often nonlinear) function of \(x_i^T\beta\). They’re considered linear because the covariates affect the distribution of \(Y_i\) only through the linear combination \(x_i^T\beta\). Some examples of Generalized Linear Models are: Linear Regression, ANOVA, Logistic Regression and Poisson Regression


Do not confuse these with general linear models

Generalized Poisson Distribution#

A generalization of the Poisson distribution, with two parameters X1, and X2, is obtained as a limiting form of the generalized negative binomial distribution. The variance of the distribution is greater than, equal to or smaller than the mean according as X2 is positive, zero or negative. For formula and more detail, visit the link in the title.

Hamiltonian Monte Carlo#

A Markov Chain Monte Carlo method for obtaining a sequence of random samples which converge to being distributed according to a target probability distribution.

Hierarchical Ordinary Differential Equation#

Individual, group, or other level types calculations of Ordinary Differential Equation’s.

just for the purpose of model selection or model averaging but more importantly to better understand these models and iv) Preparation of the results for a particular audience. These non-inferencial tasks require both numerical and visual summaries to help practitioners analyse their models. And they are sometimes collectively known as Exploratory Analysis of Bayesian Models.#
  • For a compact overview, see Bayesian statistics and modelling by van de Schoot, R., Depaoli, S., King, R. et al in Nat Rev Methods - Primers 1, 1 (2021).

  • For an in-depth overview, see Bayesian Workflow by Andrew Gelman, Aki Vehtari, Daniel Simpson, Charles C. Margossian, Bob Carpenter, Yuling Yao, Lauren Kennedy, Jonah Gabry, Paul-Christian Bürkner, Martin Modrák

  • For an exercise-based material, see Think Bayes 2e: Bayesian Statistics Made Simple by Allen B. Downey

  • For an upcoming textbook that uses PyMC, Tensorflow Probability, and ArviZ libraries, see Bayesian Modeling and Computation by Osvaldo A. Martin, Ravin Kumar and Junpeng Lao


There are many perspectives on likelihood, but conceptually we can think about it as the probability of the data, given the parameters. Or in other words, as the relative number of ways the data could have been produced.

  • For an in-depth unfolding of the concept, refer to Statistical Rethinking 2nd Edition By Richard McElreath, particularly chapter 2.

  • For the problem-based material, see Think Bayes 2e: Bayesian Statistics Made Simple by Allen B. Downey

  • For univariate, continuous scenarios, see the calibr8 paper: Bayesian calibration, process modeling and uncertainty quantification in biotechnology by Laura Marie Helleckes, Michael Osthege, Wolfgang Wiechert, Eric von Lieres, Marco Oldiges

Markov Chain#

A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.

Markov Chain Monte Carlo#

Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov Chain that has the desired distribution as its equilibrium distribution, one can obtain a sample of the desired distribution by recording states from the chain. Various algorithms exist for constructing chains, including the Metropolis–Hastings algorithm.

Maximum a Posteriori#

It is a point-estimate of an unknown quantity, that equals the mode of the posterior distribution.

If the prior distribution is a flat distribution, the MAP method is numerically equivalent to the Maximum Likelihood Estimate (MLE). When the prior is not flat the MAP estimation can be seen as a regularized version of the MLE.

No-U-Turn Sampler#

An extension of Hamiltonian Monte Carlo that algorithmically sets likely candidate points that spans a wide swath of the target distribution, stopping automatically when it starts to double back and retrace its steps.

Ordinary Differential Equation#

A type of differential equation containing one or more functions of one independent variable and the derivatives of those functions


In statistics, overdispersion is the presence of greater variability in a data set than would be expected based on a given statistical model.


The outcome of Bayesian inference is a posterior distribution, which describes the relative plausibilities of every possible combination of parameter values, given the observed data. We can think of the posterior as the updated priors after the model has seen the data.

When the posterior is obtained using numerical methods we generally need to first diagnose the quality of the computed approximation. This is necessary as, for example, methods like MCMC has only asymptotic guarantees. In a Bayesian setting predictions can be simulated by sampling from the posterior predictive distribution. When such predictions are used to check the internal consistency of the models by comparing it with the observed data used for inference, the process is known as the posterior predictive checks.

Once you are satisfied with the model, posterior distribution can be summarized and interpreted. Common questions for the posterior include: intervals of defined boundaries, intervals of defined probability mass, and point estimates. When the posterior is very similar to the prior, the available data does not contain much information about a parameter of interest.

  • For more on generating and interpreting the posterior samples, see Statistical Rethinking 2nd Edition By Richard McElreath, chapter 3.


Bayesian statistics allow us, in principle, to include all information we have about the structure of the problem into a model. We can do this via assuming prior distributions of the model’s parameters. Priors represent the plausibility of the value of the parameters before accounting for the data. Priors multiplied by likelihood produce the posterior.

Priors’ informativeness can fall anywhere on the complete uncertainty to relative certainty continuum. An informative prior might encode known restrictions on the possible range of values of that parameter.

To understand the implications of a prior and likelihood we can simulate predictions from the model, before seeing any data. This can be done by taking samples from the prior predictive distribution.

  • For an in-depth guide to priors, consider Statistical Rethinking 2nd Edition By Richard McElreath, especially chapters 2.3

Probability Mass Function#

A function that gives the probability that a discrete random variable is exactly equal to some value.


Any scalar or sequence that can be interpreted as a TensorVariable. In addition to TensorVariables, this includes NumPy ndarrays, scalars, lists and tuples (possibly nested). Any argument accepted by aesara.tensor.as_tensor_variable is tensor_like.

import aesara.tensor as at

at.as_tensor_variable([[1, 2.0], [0, 0]])
TensorConstant{[[1. 2.]
 [0. 0.]]}

In statistics, underdispersion is the presence of lower variability in a data set than would be expected based on a given statistical model.