Inference¶
Sampling¶
Functions for MCMC sampling.

pymc3.sampling.
fast_sample_posterior_predictive
(trace: MultiTrace  Dataset  InferenceData  list[dict[str, np.ndarray]], samples: int  None = None, model: Model  None = None, var_names: list[str]  None = None, keep_size: bool = False, random_seed=None) → dict[str, np.ndarray]¶ Generate posterior predictive samples from a model given a trace.
This is a vectorized alternative to the standard
sample_posterior_predictive
function. It aims to be as compatible as possible with the original API, and is significantly faster. Both posterior predictive sampling functions have some remaining issues, and we encourage users to verify agreement across the results of both functions for the time being. Parameters
 trace: MultiTrace, xarray.Dataset, InferenceData, or List of points (dictionary)
Trace generated from MCMC sampling.
 samples: int, optional
Number of posterior predictive samples to generate. Defaults to one posterior predictive sample per posterior sample, that is, the number of draws times the number of chains. It is not recommended to modify this value; when modified, some chains may not be represented in the posterior predictive sample.
 model: Model (optional if in `with` context)
Model used to generate trace
 var_names: Iterable[str]
List of vars to sample.
 keep_size: bool, optional
Force posterior predictive sample to have the same shape as posterior and sample stats data:
(nchains, ndraws, ...)
. random_seed: int
Seed for the random number generator.
 Returns
 samples: dict
Dictionary with the variable names as keys, and values numpy arrays containing posterior predictive samples.

pymc3.sampling.
init_nuts
(init='auto', chains=1, n_init=500000, model=None, random_seed=None, progressbar=True, jitter_max_retries=10, **kwargs)¶ Set up the mass matrix initialization for NUTS.
NUTS convergence and sampling speed is extremely dependent on the choice of mass/scaling matrix. This function implements different methods for choosing or adapting the mass matrix.
 Parameters
 initstr
Initialization method to use.
auto: Choose a default initialization method automatically. Currently, this is
jitter+adapt_diag
, but this can change in the future. If you depend on the exact behaviour, choose an initialization method explicitly.adapt_diag: Start with a identity mass matrix and then adapt a diagonal based on the variance of the tuning samples. All chains use the test value (usually the prior mean) as starting point.
jitter+adapt_diag: Same as
adapt_diag
, but use test value plus a uniform jitter in [1, 1] as starting point in each chain.advi+adapt_diag: Run ADVI and then adapt the resulting diagonal mass matrix based on the sample variance of the tuning samples.
advi+adapt_diag_grad: Run ADVI and then adapt the resulting diagonal mass matrix based on the variance of the gradients during tuning. This is experimental and might be removed in a future release.
advi: Run ADVI to estimate posterior mean and diagonal mass matrix.
advi_map: Initialize ADVI with MAP and use MAP as starting point.
map: Use the MAP as starting point. This is discouraged.
adapt_full: Adapt a dense mass matrix using the sample covariances. All chains use the test value (usually the prior mean) as starting point.
jitter+adapt_full: Same as
adapt_full
, but use test value plus a uniform jitter in [1, 1] as starting point in each chain.
 chainsint
Number of jobs to start.
 n_initint
Number of iterations of initializer. Only works for ‘ADVI’ init methods.
 modelModel (optional if in
with
context)  progressbarbool
Whether or not to display a progressbar for advi sampling.
 jitter_max_retriesint
Maximum number of repeated attempts (per chain) at creating an initial matrix with uniform jitter that yields a finite probability. This applies to
jitter+adapt_diag
andjitter+adapt_full
init methods. **kwargskeyword arguments
Extra keyword arguments are forwarded to pymc3.NUTS.
 Returns
 start
pymc3.model.Point
Starting point for sampler
 nuts_sampler
pymc3.step_methods.NUTS
Instantiated and initialized NUTS sampler object
 start

pymc3.sampling.
iter_sample
(draws: int, step, start: Optional[Dict[Any, Any]] = None, trace=None, chain=0, tune: Optional[int] = None, model: Optional[pymc3.model.Model] = None, random_seed: Optional[Union[int, List[int]]] = None, callback=None)¶ Generate a trace on each iteration using the given step method.
Multiple step methods ared supported via compound step methods. Returns the amount of time taken.
 Parameters
 drawsint
The number of samples to draw
 stepfunction
Step function
 startdict
Starting point in parameter space (or partial point). Defaults to trace.point(1)) if there is a trace provided and model.test_point if not (defaults to empty dict)
 tracebackend, list, or MultiTrace
This should be a backend instance, a list of variables to track, or a MultiTrace object with past values. If a MultiTrace object is given, it must contain samples for the chain number
chain
. If None or a list of variables, the NDArray backend is used. chainint, optional
Chain number used to store sample in backend. If
cores
is greater than one, chain numbers will start here. tuneint, optional
Number of iterations to tune, if applicable (defaults to None)
 modelModel (optional if in
with
context)  random_seedint or list of ints, optional
A list is accepted if more if
cores
is greater than one. callback :
A function which gets called for every sample from the trace of a chain. The function is called with the trace and the current draw and will contain all samples for a single trace. the
draw.chain
argument can be used to determine which of the active chains the sample is drawn from. Sampling can be interrupted by throwing aKeyboardInterrupt
in the callback.
 Yields
 traceMultiTrace
Contains all samples up to the current iteration
Examples
for trace in iter_sample(500, step): ...

pymc3.sampling.
sample
(draws=1000, step=None, init='auto', n_init=200000, start=None, trace=None, chain_idx=0, chains=None, cores=None, tune=1000, progressbar=True, model=None, random_seed=None, discard_tuned_samples=True, compute_convergence_checks=True, callback=None, jitter_max_retries=10, *, return_inferencedata=None, idata_kwargs: Optional[dict] = None, mp_ctx=None, pickle_backend: str = 'pickle', **kwargs)¶ Draw samples from the posterior using the given step methods.
Multiple step methods are supported via compound step methods.
 Parameters
 drawsint
The number of samples to draw. Defaults to 1000. The number of tuned samples are discarded by default. See
discard_tuned_samples
. initstr
Initialization method to use for autoassigned NUTS samplers.
auto: Choose a default initialization method automatically. Currently, this is
jitter+adapt_diag
, but this can change in the future. If you depend on the exact behaviour, choose an initialization method explicitly.adapt_diag: Start with a identity mass matrix and then adapt a diagonal based on the variance of the tuning samples. All chains use the test value (usually the prior mean) as starting point.
jitter+adapt_diag: Same as
adapt_diag
, but add uniform jitter in [1, 1] to the starting point in each chain.advi+adapt_diag: Run ADVI and then adapt the resulting diagonal mass matrix based on the sample variance of the tuning samples.
advi+adapt_diag_grad: Run ADVI and then adapt the resulting diagonal mass matrix based on the variance of the gradients during tuning. This is experimental and might be removed in a future release.
advi: Run ADVI to estimate posterior mean and diagonal mass matrix.
advi_map: Initialize ADVI with MAP and use MAP as starting point.
map: Use the MAP as starting point. This is discouraged.
adapt_full: Adapt a dense mass matrix using the sample covariances
 stepfunction or iterable of functions
A step function or collection of functions. If there are variables without step methods, step methods for those variables will be assigned automatically. By default the NUTS step method will be used, if appropriate to the model; this is a good default for beginning users.
 n_initint
Number of iterations of initializer. Only works for ‘ADVI’ init methods.
 startdict, or array of dict
Starting point in parameter space (or partial point) Defaults to
trace.point(1))
if there is a trace provided and model.test_point if not (defaults to empty dict). Initialization methods for NUTS (seeinit
keyword) can overwrite the default. tracebackend, list, or MultiTrace
This should be a backend instance, a list of variables to track, or a MultiTrace object with past values. If a MultiTrace object is given, it must contain samples for the chain number
chain
. If None or a list of variables, the NDArray backend is used. chain_idxint
Chain number used to store sample in backend. If
chains
is greater than one, chain numbers will start here. chainsint
The number of chains to sample. Running independent chains is important for some convergence statistics and can also reveal multiple modes in the posterior. If
None
, then set to eithercores
or 2, whichever is larger. coresint
The number of chains to run in parallel. If
None
, set to the number of CPUs in the system, but at most 4. tuneint
Number of iterations to tune, defaults to 1000. Samplers adjust the step sizes, scalings or similar during tuning. Tuning samples will be drawn in addition to the number specified in the
draws
argument, and will be discarded unlessdiscard_tuned_samples
is set to False. progressbarbool, optional default=True
Whether or not to display a progress bar in the command line. The bar shows the percentage of completion, the sampling speed in samples per second (SPS), and the estimated remaining time until completion (“expected time of arrival”; ETA).
 modelModel (optional if in
with
context)  random_seedint or list of ints
A list is accepted if
cores
is greater than one. discard_tuned_samplesbool
Whether to discard posterior samples of the tune interval.
 compute_convergence_checksbool, default=True
Whether to compute sampler statistics like GelmanRubin and
effective_n
. callbackfunction, default=None
A function which gets called for every sample from the trace of a chain. The function is called with the trace and the current draw and will contain all samples for a single trace. the
draw.chain
argument can be used to determine which of the active chains the sample is drawn from. Sampling can be interrupted by throwing aKeyboardInterrupt
in the callback. jitter_max_retriesint
Maximum number of repeated attempts (per chain) at creating an initial matrix with uniform jitter that yields a finite probability. This applies to
jitter+adapt_diag
andjitter+adapt_full
init methods. return_inferencedatabool, default=False
Whether to return the trace as an
arviz.InferenceData
(True) object or a MultiTrace (False) Defaults to False, but we’ll switch to True in an upcoming release. idata_kwargsdict, optional
Keyword arguments for
arviz.from_pymc3()
 mp_ctxmultiprocessing.context.BaseContent
A multiprocessing context for parallel sampling. See multiprocessing documentation for details.
 pickle_backendstr
One of ‘pickle’ or ‘dill’. The library used to pickle models in parallel sampling if the multiprocessing context is not of type fork.
 Returns
 tracepymc3.backends.base.MultiTrace or arviz.InferenceData
A
MultiTrace
or ArviZInferenceData
object that contains the samples.
Notes
Optional keyword arguments can be passed to
sample
to be delivered to thestep_method
s used during sampling.If your model uses only one step method, you can address step method kwargs directly. In particular, the NUTS step method has several options including:
target_accept : float in [0, 1]. The step size is tuned such that we approximate this acceptance rate. Higher values like 0.9 or 0.95 often work better for problematic posteriors
max_treedepth : The maximum depth of the trajectory tree
step_scale : float, default 0.25 The initial guess for the step size scaled down by \(1/n**(1/4)\)
If your model uses multiple step methods, aka a Compound Step, then you have two ways to address arguments to each step method:
If you let
sample()
automatically assign thestep_method
s, and you can correctly anticipate what they will be, then you can wrap step method kwargs in a dict and pass that to sample() with a kwarg set to the name of the step method. e.g. for a CompoundStep comprising NUTS and BinaryGibbsMetropolis, you could send:target_accept
to NUTS: nuts={‘target_accept’:0.9}transit_p
to BinaryGibbsMetropolis: binary_gibbs_metropolis={‘transit_p’:.7}
Note that available names are:
nuts
,hmc
,metropolis
,binary_metropolis
,binary_gibbs_metropolis
,categorical_gibbs_metropolis
,DEMetropolis
,DEMetropolisZ
,slice
If you manually declare the
step_method
s, within thestep
kwarg, then you can address thestep_method
kwargs directly. e.g. for a CompoundStep comprising NUTS and BinaryGibbsMetropolis, you could sendstep=[pm.NUTS([freeRV1, freeRV2], target_accept=0.9), pm.BinaryGibbsMetropolis([freeRV3], transit_p=.7)]
You can find a full list of arguments in the docstring of the step methods.
Examples
In [1]: import pymc3 as pm ...: n = 100 ...: h = 61 ...: alpha = 2 ...: beta = 2 In [2]: with pm.Model() as model: # context management ...: p = pm.Beta("p", alpha=alpha, beta=beta) ...: y = pm.Binomial("y", n=n, p=p, observed=h) ...: trace = pm.sample() In [3]: az.summary(trace, kind="stats") Out[3]: mean sd hdi_3% hdi_97% p 0.609 0.047 0.528 0.699

pymc3.sampling.
sample_posterior_predictive
(trace, samples: Optional[int] = None, model: Optional[pymc3.model.Model] = None, var_names: Optional[List[str]] = None, size: Optional[int] = None, keep_size: Optional[bool] = False, random_seed=None, progressbar: bool = True) → Dict[str, numpy.ndarray]¶ Generate posterior predictive samples from a model given a trace.
 Parameters
 tracebackend, list, xarray.Dataset, arviz.InferenceData, or MultiTrace
Trace generated from MCMC sampling, or a list of dicts (eg. points or from find_MAP()), or xarray.Dataset (eg. InferenceData.posterior or InferenceData.prior)
 samplesint
Number of posterior predictive samples to generate. Defaults to one posterior predictive sample per posterior sample, that is, the number of draws times the number of chains. It is not recommended to modify this value; when modified, some chains may not be represented in the posterior predictive sample.
 modelModel (optional if in
with
context) Model used to generate
trace
 varsiterable
Variables for which to compute the posterior predictive samples. Deprecated: please use
var_names
instead. var_namesIterable[str]
Names of variables for which to compute the posterior predictive samples.
 sizeint
The number of random draws from the distribution specified by the parameters in each sample of the trace. Not recommended unless more than ndraws times nchains posterior predictive samples are needed.
 keep_sizebool, optional
Force posterior predictive sample to have the same shape as posterior and sample stats data:
(nchains, ndraws, ...)
. Overrides samples and size parameters. random_seedint
Seed for the random number generator.
 progressbarbool
Whether or not to display a progress bar in the command line. The bar shows the percentage of completion, the sampling speed in samples per second (SPS), and the estimated remaining time until completion (“expected time of arrival”; ETA).
 Returns
 samplesdict
Dictionary with the variable names as keys, and values numpy arrays containing posterior predictive samples.

pymc3.sampling.
sample_posterior_predictive_w
(traces, samples: Optional[int] = None, models: Optional[List[pymc3.model.Model]] = None, weights: Optional[Union[numpy.ndarray, List[float]]] = None, random_seed: Optional[int] = None, progressbar: bool = True)¶ Generate weighted posterior predictive samples from a list of models and a list of traces according to a set of weights.
 Parameters
 traceslist or list of lists
List of traces generated from MCMC sampling (xarray.Dataset, arviz.InferenceData, or MultiTrace), or a list of list containing dicts from find_MAP() or points. The number of traces should be equal to the number of weights.
 samplesint, optional
Number of posterior predictive samples to generate. Defaults to the length of the shorter trace in traces.
 modelslist of Model
List of models used to generate the list of traces. The number of models should be equal to the number of weights and the number of observed RVs should be the same for all models. By default a single model will be inferred from
with
context, in this case results will only be meaningful if all models share the same distributions for the observed RVs. weightsarraylike, optional
Individual weights for each trace. Default, same weight for each model.
 random_seedint, optional
Seed for the random number generator.
 progressbarbool, optional default True
Whether or not to display a progress bar in the command line. The bar shows the percentage of completion, the sampling speed in samples per second (SPS), and the estimated remaining time until completion (“expected time of arrival”; ETA).
 Returns
 samplesdict
Dictionary with the variables as keys. The values corresponding to the posterior predictive samples from the weighted models.

pymc3.sampling.
sample_prior_predictive
(samples=500, model: Optional[pymc3.model.Model] = None, var_names: Optional[Iterable[str]] = None, random_seed=None) → Dict[str, numpy.ndarray]¶ Generate samples from the prior predictive distribution.
 Parameters
 samplesint
Number of samples from the prior predictive to generate. Defaults to 500.
 modelModel (optional if in
with
context)  var_namesIterable[str]
A list of names of variables for which to compute the posterior predictive samples. Defaults to both observed and unobserved RVs.
 random_seedint
Seed for the random number generator.
 Returns
 dict
Dictionary with variable names as keys. The values are numpy arrays of prior samples.
Stepmethods¶

pymc3.sampling.
assign_step_methods
(model, step=None, methods=(<class 'pymc3.step_methods.hmc.nuts.NUTS'>, <class 'pymc3.step_methods.hmc.hmc.HamiltonianMC'>, <class 'pymc3.step_methods.metropolis.Metropolis'>, <class 'pymc3.step_methods.metropolis.BinaryMetropolis'>, <class 'pymc3.step_methods.metropolis.BinaryGibbsMetropolis'>, <class 'pymc3.step_methods.slicer.Slice'>, <class 'pymc3.step_methods.metropolis.CategoricalGibbsMetropolis'>, <class 'pymc3.step_methods.pgbart.PGBART'>), step_kwargs=None)¶ Assign model variables to appropriate step methods.
Passing a specified model will autoassign its constituent stochastic variables to step methods based on the characteristics of the variables. This function is intended to be called automatically from
sample()
, but may be called manually. Each step method passed should have acompetence()
method that returns an ordinal competence value corresponding to the variable passed to it. This value quantifies the appropriateness of the step method for sampling the variable. Parameters
 modelModel object
A fullyspecified model object
 stepstep function or vector of step functions
One or more step functions that have been assigned to some subset of the model’s parameters. Defaults to
None
(no assigned variables). methodsvector of step method classes
The set of step methods from which the function may choose. Defaults to the main step methods provided by PyMC3.
 step_kwargsdict
Parameters for the samplers. Keys are the lower case names of the step method, values a dict of arguments.
 Returns
 methodslist
List of step methods associated with the model’s variables.
NUTS¶

class
pymc3.step_methods.hmc.nuts.
NUTS
(*args, **kwargs)¶ A sampler for continuous variables based on Hamiltonian mechanics.
NUTS automatically tunes the step size and the number of steps per sample. A detailed description can be found at [1], “Algorithm 6: Efficient NoUTurn Sampler with Dual Averaging”.
NUTS provides a number of statistics that can be accessed with trace.get_sampler_stats:
mean_tree_accept: The mean acceptance probability for the tree that generated this sample. The mean of these values across all samples but the burnin should be approximately target_accept (the default for this is 0.8).
diverging: Whether the trajectory for this sample diverged. If there are any divergences after burnin, this indicates that the results might not be reliable. Reparametrization can often help, but you can also try to increase target_accept to something like 0.9 or 0.95.
energy: The energy at the point in phasespace where the sample was accepted. This can be used to identify posteriors with problematically long tails. See below for an example.
energy_change: The difference in energy between the start and the end of the trajectory. For a perfect integrator this would always be zero.
max_energy_change: The maximum difference in energy along the whole trajectory.
depth: The depth of the tree that was used to generate this sample
tree_size: The number of leafs of the sampling tree, when the sample was accepted. This is usually a bit less than 2 ** depth. If the tree size is large, the sampler is using a lot of leapfrog steps to find the next sample. This can for example happen if there are strong correlations in the posterior, if the posterior has long tails, if there are regions of high curvature (“funnels”), or if the variance estimates in the mass matrix are inaccurate. Reparametrisation of the model or estimating the posterior variances from past samples might help.
tune: This is True, if step size adaptation was turned on when this sample was generated.
step_size: The step size used for this sample.
step_size_bar: The current best known stepsize. After the tuning samples, the step size is set to this value. This should converge during tuning.
model_logp: The model loglikelihood for this sample.
process_time_diff: The time it took to draw the sample, as defined by the python standard library time.process_time. This counts all the CPU time, including worker processes in BLAS and OpenMP.
perf_counter_diff: The time it took to draw the sample, as defined by the python standard library time.perf_counter (wall time).
perf_counter_start: The value of time.perf_counter at the beginning of the computation of the draw.
References
 1
Hoffman, Matthew D., & Gelman, Andrew. (2011). The NoUTurn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.
Set up the NoUTurn sampler.
 Parameters
 vars: list of Theano variables, default all continuous vars
 Emax: float, default 1000
Maximum energy change allowed during leapfrog steps. Larger deviations will abort the integration.
 target_accept: float, default .8
Adapt the step size such that the average acceptance probability across the trajectories are close to target_accept. Higher values for target_accept lead to smaller step sizes. Setting this to higher values like 0.9 or 0.99 can help with sampling from difficult posteriors. Valid values are between 0 and 1 (exclusive).
 step_scale: float, default 0.25
Size of steps to take, automatically scaled down by 1/n**(1/4). If step size adaptation is switched off, the resulting step size is used. If adaptation is enabled, it is used as initial guess.
 gamma: float, default .05
 k: float, default .75
Parameter for dual averaging for step size adaptation. Values between 0.5 and 1 (exclusive) are admissible. Higher values correspond to slower adaptation.
 t0: int, default 10
Parameter for dual averaging. Higher values slow initial adaptation.
 adapt_step_size: bool, default=True
Whether step size adaptation should be enabled. If this is disabled, k, t0, gamma and target_accept are ignored.
 max_treedepth: int, default=10
The maximum tree depth. Trajectories are stopped when this depth is reached.
 early_max_treedepth: int, default=8
The maximum tree depth during the first 200 tuning samples.
 scaling: array_like, ndim = {1,2}
The inverse mass, or precision matrix. One dimensional arrays are interpreted as diagonal matrices. If is_cov is set to True, this will be interpreded as the mass or covariance matrix.
 is_cov: bool, default=False
Treat the scaling as mass or covariance matrix.
 potential: Potential, optional
An object that represents the Hamiltonian with methods velocity, energy, and random methods. It can be specified instead of the scaling matrix.
 model: pymc3.Model
The model
 kwargs: passed to BaseHMC
Notes
The step size adaptation stops when self.tune is set to False. This is usually achieved by setting the tune parameter if pm.sample to the desired number of tuning steps.

static
competence
(var, has_grad)¶ Check how appropriate this class is for sampling a random variable.
Metropolis¶

class
pymc3.step_methods.metropolis.
BinaryGibbsMetropolis
(*args, **kwargs)¶ A MetropoliswithinGibbs step method optimized for binary variables
 Parameters
 vars: list
List of variables for sampler
 order: list or ‘random’
List of integers indicating the Gibbs update order e.g., [0, 2, 1, …]. Default is random
 transit_p: float
The diagonal of the transition kernel. A value > .5 gives anticorrelated proposals, which resulting in more efficient antithetical sampling. Default is 0.8
 model: PyMC Model
Optional model for sampling step. Defaults to None (taken from context).

static
competence
(var)¶ BinaryMetropolis is only suitable for Bernoulli and Categorical variables with k=2.

class
pymc3.step_methods.metropolis.
BinaryMetropolis
(*args, **kwargs)¶ MetropolisHastings optimized for binary variables
 Parameters
 vars: list
List of variables for sampler
 scaling: scalar or array
Initial scale factor for proposal. Defaults to 1.
 tune: bool
Flag for tuning. Defaults to True.
 tune_interval: int
The frequency of tuning. Defaults to 100 iterations.
 model: PyMC Model
Optional model for sampling step. Defaults to None (taken from context).

static
competence
(var)¶ BinaryMetropolis is only suitable for binary (bool) and Categorical variables with k=1.

class
pymc3.step_methods.metropolis.
CategoricalGibbsMetropolis
(*args, **kwargs)¶ A MetropoliswithinGibbs step method optimized for categorical variables. This step method works for Bernoulli variables as well, but it is not optimized for them, like BinaryGibbsMetropolis is. Step method supports two types of proposals: A uniform proposal and a proportional proposal, which was introduced by Liu in his 1996 technical report “Metropolized Gibbs Sampler: An Improvement”.

static
competence
(var)¶ CategoricalGibbsMetropolis is only suitable for Bernoulli and Categorical variables.

static

class
pymc3.step_methods.metropolis.
DEMetropolis
(*args, **kwargs)¶ Differential Evolution Metropolis sampling step.
 Parameters
 lamb: float
Lambda parameter of the DE proposal mechanism. Defaults to 2.38 / sqrt(2 * ndim)
 vars: list
List of variables for sampler
 S: standard deviation or covariance matrix
Some measure of variance to parameterize proposal distribution
 proposal_dist: function
Function that returns zeromean deviates when parameterized with S (and n). Defaults to Uniform(S,+S).
 scaling: scalar or array
Initial scale factor for epsilon. Defaults to 0.001
 tune: str
Which hyperparameter to tune. Defaults to None, but can also be ‘scaling’ or ‘lambda’.
 tune_interval: int
The frequency of tuning. Defaults to 100 iterations.
 model: PyMC Model
Optional model for sampling step. Defaults to None (taken from context).
 mode: string or `Mode` instance.
compilation mode passed to Theano functions
References
 Braak2006
Cajo C.F. ter Braak (2006). A Markov Chain Monte Carlo version of the genetic algorithm Differential Evolution: easy Bayesian computing for real parameter spaces. Statistics and Computing link
 Parameters
 vars: list of sampling variables
 shared: dict of theano variable > shared variable
 blocked: Boolean (default True)

class
pymc3.step_methods.metropolis.
DEMetropolisZ
(*args, **kwargs)¶ Adaptive Differential Evolution Metropolis sampling step that uses the past to inform jumps.
 Parameters
 lamb: float
Lambda parameter of the DE proposal mechanism. Defaults to 2.38 / sqrt(2 * ndim)
 vars: list
List of variables for sampler
 S: standard deviation or covariance matrix
Some measure of variance to parameterize proposal distribution
 proposal_dist: function
Function that returns zeromean deviates when parameterized with S (and n). Defaults to Uniform(S,+S).
 scaling: scalar or array
Initial scale factor for epsilon. Defaults to 0.001
 tune: str
Which hyperparameter to tune. Defaults to ‘lambda’, but can also be ‘scaling’ or None.
 tune_interval: int
The frequency of tuning. Defaults to 100 iterations.
 tune_drop_fraction: float
Fraction of tuning steps that will be removed from the samplers history when the tuning ends. Defaults to 0.9  keeping the last 10% of tuning steps for good mixing while removing 90% of potentially unconverged tuning positions.
 model: PyMC Model
Optional model for sampling step. Defaults to None (taken from context).
 mode: string or `Mode` instance.
compilation mode passed to Theano functions
References
 Braak2006
Cajo C.F. ter Braak (2006). Differential Evolution Markov Chain with snooker updater and fewer chains. Statistics and Computing link
 Parameters
 vars: list of sampling variables
 shared: dict of theano variable > shared variable
 blocked: Boolean (default True)

reset_tuning
()¶ Resets the tuned sampler parameters and history to their initial values.

stop_tuning
()¶ At the end of the tuning phase, this method removes the first x% of the history so future proposals are not informed by unconverged tuning iterations.

class
pymc3.step_methods.metropolis.
Metropolis
(*args, **kwargs)¶ MetropolisHastings sampling step
Create an instance of a Metropolis stepper
 Parameters
 vars: list
List of variables for sampler
 S: standard deviation or covariance matrix
Some measure of variance to parameterize proposal distribution
 proposal_dist: function
Function that returns zeromean deviates when parameterized with S (and n). Defaults to normal.
 scaling: scalar or array
Initial scale factor for proposal. Defaults to 1.
 tune: bool
Flag for tuning. Defaults to True.
 tune_interval: int
The frequency of tuning. Defaults to 100 iterations.
 model: PyMC Model
Optional model for sampling step. Defaults to None (taken from context).
 mode: string or `Mode` instance.
compilation mode passed to Theano functions

reset_tuning
()¶ Resets the tuned sampler parameters to their initial values.
Slice¶

class
pymc3.step_methods.slicer.
Slice
(*args, **kwargs)¶ Univariate slice sampler step method
 Parameters
 vars: list
List of variables for sampler.
 w: float
Initial width of slice (Defaults to 1).
 tune: bool
Flag for tuning (Defaults to True).
 model: PyMC Model
Optional model for sampling step. Defaults to None (taken from context).
Hamiltonian Monte Carlo¶

class
pymc3.step_methods.hmc.hmc.
HamiltonianMC
(*args, **kwargs)¶ A sampler for continuous variables based on Hamiltonian mechanics.
See NUTS sampler for automatically tuned stopping time and step size scaling.
Set up the Hamiltonian Monte Carlo sampler.
 Parameters
 vars: list of theano variables
 path_length: float, default=2
total length to travel
 step_rand: function float > float, default=unif
A function which takes the step size and returns an new one used to randomize the step size at each iteration.
 step_scale: float, default=0.25
Initial size of steps to take, automatically scaled down by 1/n**(1/4).
 scaling: array_like, ndim = {1,2}
The inverse mass, or precision matrix. One dimensional arrays are interpreted as diagonal matrices. If is_cov is set to True, this will be interpreded as the mass or covariance matrix.
 is_cov: bool, default=False
Treat the scaling as mass or covariance matrix.
 potential: Potential, optional
An object that represents the Hamiltonian with methods velocity, energy, and random methods. It can be specified instead of the scaling matrix.
 target_accept: float, default 0.65
Adapt the step size such that the average acceptance probability across the trajectories are close to target_accept. Higher values for target_accept lead to smaller step sizes. Setting this to higher values like 0.9 or 0.99 can help with sampling from difficult posteriors. Valid values are between 0 and 1 (exclusive). Default of 0.65 is from (Beskos et. al. 2010, Neal 2011). See Hoffman and Gelman’s “The NoUTurn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo” section 3.2 for details.
 gamma: float, default .05
 k: float, default .75
Parameter for dual averaging for step size adaptation. Values between 0.5 and 1 (exclusive) are admissible. Higher values correspond to slower adaptation.
 t0: float > 0, default 10
Parameter for dual averaging. Higher values slow initial adaptation.
 adapt_step_size: bool, default=True
Whether step size adaptation should be enabled. If this is disabled, k, t0, gamma and target_accept are ignored.
 max_steps: int
The maximum number of leapfrog steps.
 model: pymc3.Model
The model
 **kwargs: passed to BaseHMC

static
competence
(var, has_grad)¶ Check how appropriate this class is for sampling a random variable.
Sequential Monte Carlo¶

class
pymc3.smc.smc.
SMC
(draws=2000, kernel='metropolis', n_steps=25, start=None, tune_steps=True, p_acc_rate=0.85, threshold=0.5, save_sim_data=False, save_log_pseudolikelihood=True, model=None, random_seed= 1, chain=0)¶ Sequential Monte Carlo with Independent MetropolisHastings and ABC kernels.

initialize_logp
()¶ Initialize the prior and likelihood log probabilities.

initialize_population
()¶ Create an initial population from the prior distribution.

mutate
()¶ Independent MetropolisHastings perturbation.

posterior_to_trace
()¶ Save results into a PyMC3 trace.

resample
()¶ Resample particles based on importance weights.

setup_kernel
()¶ Set up the likelihood logp function based on the chosen kernel.

tune
()¶ Tune n_steps based on the acceptance rate.

update_proposal
()¶ Update proposal based on the covariance matrix from tempered posterior.

update_weights_beta
()¶ Calculate the next inverse temperature (beta).
The importance weights based on current beta and tempered likelihood and updates the marginal likelihood estimate.

MultiTrace¶

class
pymc3.backends.base.
MultiTrace
(straces)¶ Main interface for accessing values from MCMC results.
The core method to select values is get_values. The method to select sampler statistics is get_sampler_stats. Both kinds of values can also be accessed by indexing the MultiTrace object. Indexing can behave in four ways:
Indexing with a variable or variable name (str) returns all values for that variable, combining values for all chains.
>>> trace[varname]
Slicing after the variable name can be used to burn and thin the samples.
>>> trace[varname, 1000:]
For convenience during interactive use, values can also be accessed using the variable as an attribute.
>>> trace.varname
Indexing with an integer returns a dictionary with values for each variable at the given index (corresponding to a single sampling iteration).
Slicing with a range returns a new trace with the number of draws corresponding to the range.
Indexing with the name of a sampler statistic that is not also the name of a variable returns those values from all chains. If there is more than one sampler that provides that statistic, the values are concatenated along a new axis.
For any methods that require a single trace (e.g., taking the length of the MultiTrace instance, which returns the number of draws), the trace with the highest chain number is always used.
 Attributes
 nchains: int
Number of chains in the MultiTrace.
 chains: `List[int]`
List of chain indices
 report: str
Report on the sampling process.
 varnames: `List[str]`
List of variable names in the trace(s)

add_values
(vals, overwrite=False) → None¶ Add variables to traces.
 Parameters
 vals: dict (str: arraylike)
The keys should be the names of the new variables. The values are expected to be arraylike objects. For traces with more than one chain the length of each value should match the number of total samples already in the trace (chains * iterations), otherwise a warning is raised.
 overwrite: bool
If False (default) a ValueError is raised if the variable already exists. Change to True to overwrite the values of variables
 Returns
 None.

get_sampler_stats
(stat_name, burn=0, thin=1, combine=True, chains=None, squeeze=True)¶ Get sampler statistics from the trace.
 Parameters
 stat_name: str
 sampler_idx: int or None
 burn: int
 thin: int
 Returns
 If the sampler_idx is specified, return the statistic with
 the given name in a numpy array. If it is not specified and there
 is more than one sampler that provides this statistic, return
 a numpy array of shape (m, n), where m is the number of
 such samplers, and n is the number of samples.

get_values
(varname, burn=0, thin=1, combine=True, chains=None, squeeze=True)¶ Get values from traces.
 Parameters
 varname: str
 burn: int
 thin: int
 combine: bool
If True, results from chains will be concatenated.
 chains: int or list of ints
Chains to retrieve. If None, all chains are used. A single chain value can also be given.
 squeeze: bool
Return a single array element if the resulting list of values only has one element. If False, the result will always be a list of arrays, even if combine is True.
 Returns
 A list of NumPy arrays or a single NumPy array (depending on
 squeeze).

point
(idx, chain=None)¶ Return a dictionary of point values at idx.
 Parameters
 idx: int
 chain: int
If a chain is not given, the highest chain number is used.

points
(chains=None)¶ Return an iterator over all or some of the sample points
 Parameters
 chains: list of int or N
The chains whose points should be inlcuded in the iterator. If chains is not given, include points from all chains.

remove_values
(name)¶ remove variables from traces.
 Parameters
 name: str
Name of the variable to remove. Raises KeyError if the variable is not present

class
pymc3.backends.base.
BaseTrace
(name, model=None, vars=None, test_point=None)¶ Base trace object
 Parameters
 name: str
Name of backend
 model: Model
If None, the model is taken from the with context.
 vars: list of variables
Sampling values will be stored for these variables. If None, model.unobserved_RVs is used.
 test_point: dict
use different test point that might be with changed variables shapes
Variational Inference¶
OPVI¶
Variational inference is a great approach for doing really complex, often intractable Bayesian inference in approximate form. Common methods (e.g. ADVI) lack from complexity so that approximate posterior does not reveal the true nature of underlying problem. In some applications it can yield unreliable decisions.
Recently on NIPS 2017 OPVI framework was presented. It generalizes variational inference so that the problem is build with blocks. The first and essential block is Model itself. Second is Approximation, in some cases \(log Q(D)\) is not really needed. Necessity depends on the third and fourth part of that black box, Operator and Test Function respectively.
Operator is like an approach we use, it constructs loss from given Model, Approximation and Test Function. The last one is not needed if we minimize KL Divergence from Q to posterior. As a drawback we need to compute \(loq Q(D)\). Sometimes approximation family is intractable and \(loq Q(D)\) is not available, here comes LS(Langevin Stein) Operator with a set of test functions.
Test Function has more unintuitive meaning. It is usually used with LS operator and represents all we want from our approximate distribution. For any given vector based function of \(z\) LS operator yields zero mean function under posterior. \(loq Q(D)\) is no more needed. That opens a door to rich approximation families as neural networks.
References
Rajesh Ranganath, Jaan Altosaar, Dustin Tran, David M. Blei Operator Variational Inference https://arxiv.org/abs/1610.09033 (2016)

class
pymc3.variational.opvi.
Approximation
(groups, model=None)¶ Wrapper for grouped approximations
Wraps list of groups, creates an Approximation instance that collects sampled variables from all the groups, also collects logQ needed for explicit Variational Inference.
 Parameters
 groups: list[Group]
List of
Group
instances. They should have all model variables model: Model
See also
Notes
Some shortcuts for single group approximations are available:
MeanField
FullRank
NormalizingFlow
Empirical
Single group accepts local_rv keyword with dict mapping PyMC3 variables to their local Group parameters dict

property
datalogp
¶ Dev  computes \(E_{q}(data term)\) from model via theano.scan that can be optimized later

property
datalogp_norm
¶ Dev  normalized \(E_{q}(data term)\)

get_optimization_replacements
(s, d)¶ Dev  optimizations for logP. If sample size is static and equal to 1: then theano.scan MC estimate is replaced with single sample without call to theano.scan.

property
logp
¶ Dev  computes \(E_{q}(logP)\) from model via theano.scan that can be optimized later

property
logp_norm
¶ Dev  normalized \(E_{q}(logP)\)

property
logq
¶ Dev  collects logQ for all groups

property
logq_norm
¶ Dev  collects logQ for all groups and normalizes it

make_size_and_deterministic_replacements
(s, d, more_replacements=None)¶ Dev  creates correct replacements for initial depending on sample size and deterministic flag
 Parameters
 s: scalar
sample size
 d: bool
whether sampling is done deterministically
 more_replacements: dict
replacements for shape and initial
 Returns
 dict with replacements for initial

property
replacements
¶ Dev  all replacements from groups to replace PyMC random variables with approximation

rslice
(name)¶ Dev  vectorized sampling for named random variable without call to theano.scan. This node still needs
set_size_and_deterministic()
to be evaluated

sample
(draws=500, include_transformed=True)¶ Draw samples from variational posterior.
 Parameters
 draws: `int`
Number of random samples.
 include_transformed: `bool`
If True, transformed variables are also sampled. Default is False.
 Returns
 trace:
pymc3.backends.base.MultiTrace
Samples drawn from variational posterior.
 trace:

sample_node
(node, size=None, deterministic=False, more_replacements=None)¶ Samples given node or nodes over shared posterior
 Parameters
 node: Theano Variables (or Theano expressions)
 size: None or scalar
number of samples
 more_replacements: `dict`
add custom replacements to graph, e.g. change input source
 deterministic: bool
whether to use zeros as initial distribution if True  zero initial point will produce constant latent variables
 Returns
 sampled node(s) with replacements

property
scale_cost_to_minibatch
¶ Dev  Property to control scaling cost to minibatch

set_size_and_deterministic
(node, s, d, more_replacements=None)¶ Dev  after node is sampled via
symbolic_sample_over_posterior()
orsymbolic_single_sample()
new random generator can be allocated and applied to node Parameters
 node: :class:`Variable`
Theano node with symbolically applied VI replacements
 s: scalar
desired number of samples
 d: bool or int
whether sampling is done deterministically
 more_replacements: dict
more replacements to apply
 Returns
Variable
with applied replacements, ready to use

property
single_symbolic_datalogp
¶ Dev  for single MC sample estimate of \(E_{q}(data term)\) theano.scan is not needed and code can be optimized

property
single_symbolic_logp
¶ Dev  for single MC sample estimate of \(E_{q}(logP)\) theano.scan is not needed and code can be optimized

property
single_symbolic_varlogp
¶ Dev  for single MC sample estimate of \(E_{q}(prior term)\) theano.scan is not needed and code can be optimized

property
sized_symbolic_datalogp
¶ Dev  computes sampled data term from model via theano.scan

property
sized_symbolic_logp
¶ Dev  computes sampled logP from model via theano.scan

property
sized_symbolic_varlogp
¶ Dev  computes sampled prior term from model via theano.scan

property
symbolic_logq
¶ Dev  collects symbolic_logq for all groups

property
symbolic_normalizing_constant
¶ Dev  normalizing constant for self.logq, scales it to minibatch_size instead of total_size. Here the effect is controlled by self.scale_cost_to_minibatch

symbolic_sample_over_posterior
(node)¶ Dev  performs sampling of node applying independent samples from posterior each time. Note that it is done symbolically and this node needs
set_size_and_deterministic()
call

symbolic_single_sample
(node)¶ Dev  performs sampling of node applying single sample from posterior. Note that it is done symbolically and this node needs
set_size_and_deterministic()
call with size=1

to_flat_input
(node)¶ Dev  replace vars with flattened view stored in self.inputs

property
varlogp
¶ Dev  computes \(E_{q}(prior term)\) from model via theano.scan that can be optimized later

property
varlogp_norm
¶ Dev  normalized \(E_{q}(prior term)\)

class
pymc3.variational.opvi.
Group
(group=None, vfam=None, params=None, *args, **kwargs)¶ Base class for grouping variables in VI
Grouped Approximation is used for modelling mutual dependencies for a specified group of variables. Base for local and global group.
 Parameters
 group: list
List of PyMC3 variables or None indicating that group takes all the rest variables
 vfam: str
String that marks the corresponding variational family for the group. Cannot be passed both with params
 params: dict
Dict with variational family parameters, full description can be found below. Cannot be passed both with vfam
 random_seed: int
Random seed for underlying random generator
 model :
PyMC3 Model
 local: bool
Indicates whether this group is local. Cannot be passed without params. Such group should have only one variable
 rowwise: bool
Indicates whether this group is independently parametrized over first dim. Such group should have only one variable
 options: dict
Special options for the group
 kwargs: Other kwargs for the group
See also
Notes
Group instance/class has some important constants:
supports_batched Determines whether such variational family can be used for AEVB or rowwise approx.
AEVB approx is such approx that somehow depends on input data. It can be treated as conditional distribution. You can see more about in the corresponding paper mentioned in references.
Rowwise mode is a special case approximation that treats every ‘row’, of a tensor as independent from each other. Some distributions can’t do that by definition e.g.
Empirical
that consists of particles only.has_logq Tells that distribution is defined explicitly
These constants help providing the correct inference method for given parametrization
References
Kingma, D. P., & Welling, M. (2014). AutoEncoding Variational Bayes. stat, 1050, 1.
Examples
Basic Initialization
Group
is a factory class. You do not need to call every ApproximationGroup explicitly. Passing the correct vfam (Variational FAMily) argument you’ll tell what parametrization is desired for the group. This helps not to overload code with lots of classes.>>> group = Group([latent1, latent2], vfam='mean_field')
The other way to select approximation is to provide params dictionary that has some predefined well shaped parameters. Keys of the dict serve as an identifier for variational family and help to autoselect the correct group class. To identify what approximation to use, params dict should have the full set of needed parameters. As there are 2 ways to instantiate the
Group
passing both vfam and params is prohibited. Partial parametrization is prohibited by design to avoid corner cases and possible problems.>>> group = Group([latent3], params=dict(mu=my_mu, rho=my_rho))
Important to note that in case you pass custom params they will not be autocollected by optimizer, you’ll have to provide them with more_obj_params keyword.
Supported dict keys:
{‘mu’, ‘rho’}:
MeanFieldGroup
{‘mu’, ‘L_tril’}:
FullRankGroup
{‘histogram’}:
EmpiricalGroup
{0, 1, 2, 3, …, k1}:
NormalizingFlowGroup
of depth kNormalizingFlows have other parameters than ordinary groups and should be passed as nested dicts with the following keys:
{‘u’, ‘w’, ‘b’}:
PlanarFlow
{‘a’, ‘b’, ‘z_ref’}:
RadialFlow
{‘loc’}:
LocFlow
{‘rho’}:
ScaleFlow
{‘v’}:
HouseholderFlow
Note that all integer keys should be present in the dictionary. An example of NormalizingFlow initialization can be found below.
Using AEVB
Autoencoding variational Bayes is a powerful tool to get conditional \(q(\lambdaX)\) distribution on latent variables. It is well supported by PyMC3 and all you need is to provide a dictionary with well shaped variational parameters, the correct approximation will be autoselected as mentioned in section above. However we have some implementation restrictions in AEVB. They require autoencoded variable to have first dimension as batch dimension and other dimensions should stay fixed. With this assumptions it is possible to generalize all variational approximation families as batched approximations that have flexible parameters and leading axis.
Only single variable local group is supported. Params are required.
>>> # for mean field >>> group = Group([latent3], params=dict(mu=my_mu, rho=my_rho), local=True) >>> # or for full rank >>> group = Group([latent3], params=dict(mu=my_mu, L_tril=my_L_tril), local=True)
An Approximation class is selected automatically based on the keys in dict.
my_mu and my_rho are usually estimated with neural network or function approximator.
Using RowWise Group
Batch groups have independent row wise approximations, thus using batched mean field will give no effect. It is more interesting if you want each row of a matrix to be parametrized independently with normalizing flow or full rank gaussian.
To tell
Group
that group is batched you need set batched kwarg as True. Only single variable group is allowed due to implementation details.>>> group = Group([latent3], vfam='fr', rowwise=True) # 'fr' is alias for 'full_rank'
The resulting approximation for this variable will have the following structure
\[latent3_{i, \dots} \sim \mathcal{N}(\mu_i, \Sigma_i) \forall i\]Note: Using rowwise and userparametrized approximation is ok, but shape should be checked beforehand, it is impossible to infer it by PyMC3
Normalizing Flow Group
In case you use simple initialization pattern using vfam you’ll not meet any changes. Passing flow formula to vfam you’ll get correct flow parametrization for group
>>> group = Group([latent3], vfam='scalehh*5radial*4loc')
Note: Consider passing location flow as the last one and scale as the first one for stable inference.
Rowwise normalizing flow is supported as well
>>> group = Group([latent3], vfam='scalehh*2radialloc', rowwise=True)
Custom parameters for normalizing flow can be a real trouble for the first time. They have quite different format from the rest variational families.
>>> # int is used as key, it also tells the flow position ... flow_params = { ... # `rho` parametrizes scale flow, softplus is used to map (inf; inf) > (0, inf) ... 0: dict(rho=my_scale), ... 1: dict(v=my_v1), # Householder Flow, `v` is parameter name from the original paper ... 2: dict(v=my_v2), # do not miss any number in dict, or else error is raised ... 3: dict(a=my_a, b=my_b, z_ref=my_z_ref), # Radial flow ... 4: dict(loc=my_loc) # Location Flow ... } ... group = Group([latent3], params=flow_params) ... # local=True can be added in case you do AEVB inference ... group = Group([latent3], params=flow_params, local=True)
Delayed Initialization
When you have a lot of latent variables it is impractical to do it all manually. To make life much simpler, You can pass None instead of list of variables. That case you’ll not create shared parameters until you pass all collected groups to Approximation object that collects all the groups together and checks that every group is correctly initialized. For those groups which have group equal to None it will collect all the rest variables not covered by other groups and perform delayed init.
>>> group_1 = Group([latent1], vfam='fr') # latent1 has full rank approximation >>> group_other = Group(None, vfam='mf') # other variables have mean field Q >>> approx = Approximation([group_1, group_other])
Summing Up
When you have created all the groups they need to pass all the groups to
Approximation
. It does not accept any other parameter rather than groups>>> approx = Approximation(my_groups)

property
logq
¶ Dev  Monte Carlo estimate for group logQ

property
logq_norm
¶ Dev  Monte Carlo estimate for group logQ normalized

make_size_and_deterministic_replacements
(s, d, more_replacements=None)¶ Dev  creates correct replacements for initial depending on sample size and deterministic flag
 Parameters
 s: scalar
sample size
 d: bool or scalar
whether sampling is done deterministically
 more_replacements: dict
replacements for shape and initial
 Returns
 dict with replacements for initial

set_size_and_deterministic
(node, s, d, more_replacements=None)¶ Dev  after node is sampled via
symbolic_sample_over_posterior()
orsymbolic_single_sample()
new random generator can be allocated and applied to node Parameters
 node: :class:`Variable`
Theano node with symbolically applied VI replacements
 s: scalar
desired number of samples
 d: bool or int
whether sampling is done deterministically
 more_replacements: dict
more replacements to apply
 Returns
Variable
with applied replacements, ready to use

property
symbolic_logq
¶ Dev  correctly scaled self.symbolic_logq_not_scaled

property
symbolic_logq_not_scaled
¶ Dev  symbolically computed logq for self.symbolic_random computations can be more efficient since all is known beforehand including self.symbolic_random

property
symbolic_normalizing_constant
¶ Dev  normalizing constant for self.logq, scales it to minibatch_size instead of total_size

property
symbolic_random
¶ Dev  abstract node that takes self.symbolic_initial and creates approximate posterior that is parametrized with self.params_dict.
Implementation should take in account self.batched. If self.batched is True, then self.symbolic_initial is 3d tensor, else 2d
 Returns
 tensor

property
symbolic_random2d
¶ Dev  self.symbolic_random flattened to matrix

symbolic_sample_over_posterior
(node)¶ Dev  performs sampling of node applying independent samples from posterior each time. Note that it is done symbolically and this node needs
set_size_and_deterministic()
call

symbolic_single_sample
(node)¶ Dev  performs sampling of node applying single sample from posterior. Note that it is done symbolically and this node needs
set_size_and_deterministic()
call with size=1

to_flat_input
(node)¶ Dev  replace vars with flattened view stored in self.inputs

class
pymc3.variational.opvi.
ObjectiveFunction
(op, tf)¶ Helper class for construction loss and updates for variational inference
 Parameters
 op
Operator
OPVI Functional operator
 tf
TestFunction
OPVI TestFunction
 op

score_function
(sc_n_mc=None, more_replacements=None, fn_kwargs=None)¶ Compile scoring function that operates which takes no inputs and returns Loss
 Parameters
 sc_n_mc: `int`
number of scoring MC samples
 more_replacements:
Apply custom replacements before compiling a function
 fn_kwargs: `dict`
arbitrary kwargs passed to theano.function
 Returns
 theano.function

step_function
(obj_n_mc=None, tf_n_mc=None, obj_optimizer=<function adagrad_window>, test_optimizer=<function adagrad_window>, more_obj_params=None, more_tf_params=None, more_updates=None, more_replacements=None, total_grad_norm_constraint=None, score=False, fn_kwargs=None)¶ Step function that should be called on each optimization step.
Generally it solves the following problem:
\[\mathbf{\lambda^{\*}} = \inf_{\lambda} \sup_{\theta} t(\mathbb{E}_{\lambda}[(O^{p,q}f_{\theta})(z)])\] Parameters
 obj_n_mc: `int`
Number of monte carlo samples used for approximation of objective gradients
 tf_n_mc: `int`
Number of monte carlo samples used for approximation of test function gradients
 obj_optimizer: function (grads, params) > updates
Optimizer that is used for objective params
 test_optimizer: function (grads, params) > updates
Optimizer that is used for test function params
 more_obj_params: `list`
Add custom params for objective optimizer
 more_tf_params: `list`
Add custom params for test function optimizer
 more_updates: `dict`
Add custom updates to resulting updates
 total_grad_norm_constraint: `float`
Bounds gradient norm, prevents exploding gradient problem
 score: `bool`
calculate loss on each step? Defaults to False for speed
 fn_kwargs: `dict`
Add kwargs to theano.function (e.g. {‘profile’: True})
 more_replacements: `dict`
Apply custom replacements before calculating gradients
 Returns
 theano.function

updates
(obj_n_mc=None, tf_n_mc=None, obj_optimizer=<function adagrad_window>, test_optimizer=<function adagrad_window>, more_obj_params=None, more_tf_params=None, more_updates=None, more_replacements=None, total_grad_norm_constraint=None)¶ Calculate gradients for objective function, test function and then constructs updates for optimization step
 Parameters
 obj_n_mcint
Number of monte carlo samples used for approximation of objective gradients
 tf_n_mcint
Number of monte carlo samples used for approximation of test function gradients
 obj_optimizerfunction (loss, params) > updates
Optimizer that is used for objective params
 test_optimizerfunction (loss, params) > updates
Optimizer that is used for test function params
 more_obj_paramslist
Add custom params for objective optimizer
 more_tf_paramslist
Add custom params for test function optimizer
 more_updatesdict
Add custom updates to resulting updates
 more_replacementsdict
Apply custom replacements before calculating gradients
 total_grad_norm_constraintfloat
Bounds gradient norm, prevents exploding gradient problem
 Returns
ObjectiveUpdates

class
pymc3.variational.opvi.
Operator
(approx)¶ Base class for Operator
 Parameters
 approx: :class:`Approximation`
an approximation instance
Notes
For implementing custom operator it is needed to define
Operator.apply()
method
apply
(f)¶ Operator itself
\[(O^{p,q}f_{\theta})(z)\] Parameters
 f: :class:`TestFunction` or None
function that takes z = self.input and returns same dimensional output
 Returns
 TensorVariable
symbolically applied operator

objective_class
¶
VI Inference API¶

class
pymc3.variational.inference.
ADVI
(*args, **kwargs)¶ Automatic Differentiation Variational Inference (ADVI)
This class implements the meanfield ADVI, where the variational posterior distribution is assumed to be spherical Gaussian without correlation of parameters and fit to the true posterior distribution. The means and standard deviations of the variational posterior are referred to as variational parameters.
For explanation, we classify random variables in probabilistic models into three types. Observed random variables \({\cal Y}=\{\mathbf{y}_{i}\}_{i=1}^{N}\) are \(N\) observations. Each \(\mathbf{y}_{i}\) can be a set of observed random variables, i.e., \(\mathbf{y}_{i}=\{\mathbf{y}_{i}^{k}\}_{k=1}^{V_{o}}\), where \(V_{k}\) is the number of the types of observed random variables in the model.
The next ones are global random variables \(\Theta=\{\theta^{k}\}_{k=1}^{V_{g}}\), which are used to calculate the probabilities for all observed samples.
The last ones are local random variables \({\cal Z}=\{\mathbf{z}_{i}\}_{i=1}^{N}\), where \(\mathbf{z}_{i}=\{\mathbf{z}_{i}^{k}\}_{k=1}^{V_{l}}\). These RVs are used only in AEVB.
The goal of ADVI is to approximate the posterior distribution \(p(\Theta,{\cal Z}{\cal Y})\) by variational posterior \(q(\Theta)\prod_{i=1}^{N}q(\mathbf{z}_{i})\). All of these terms are normal distributions (meanfield approximation).
\(q(\Theta)\) is parametrized with its means and standard deviations. These parameters are denoted as \(\gamma\). While \(\gamma\) is a constant, the parameters of \(q(\mathbf{z}_{i})\) are dependent on each observation. Therefore these parameters are denoted as \(\xi(\mathbf{y}_{i}; \nu)\), where \(\nu\) is the parameters of \(\xi(\cdot)\). For example, \(\xi(\cdot)\) can be a multilayer perceptron or convolutional neural network.
In addition to \(\xi(\cdot)\), we can also include deterministic mappings for the likelihood of observations. We denote the parameters of the deterministic mappings as \(\eta\). An example of such mappings is the deconvolutional neural network used in the convolutional VAE example in the PyMC3 notebook directory.
This function maximizes the evidence lower bound (ELBO) \({\cal L}(\gamma, \nu, \eta)\) defined as follows:
\[\begin{split}{\cal L}(\gamma,\nu,\eta) & = \mathbf{c}_{o}\mathbb{E}_{q(\Theta)}\left[ \sum_{i=1}^{N}\mathbb{E}_{q(\mathbf{z}_{i})}\left[ \log p(\mathbf{y}_{i}\mathbf{z}_{i},\Theta,\eta) \right]\right] \\ &  \mathbf{c}_{g}KL\left[q(\Theta)p(\Theta)\right]  \mathbf{c}_{l}\sum_{i=1}^{N} KL\left[q(\mathbf{z}_{i})p(\mathbf{z}_{i})\right],\end{split}\]where \(KL[q(v)p(v)]\) is the KullbackLeibler divergence
\[KL[q(v)p(v)] = \int q(v)\log\frac{q(v)}{p(v)}dv,\]\(\mathbf{c}_{o/g/l}\) are vectors for weighting each term of ELBO. More precisely, we can write each of the terms in ELBO as follows:
\[\begin{split}\mathbf{c}_{o}\log p(\mathbf{y}_{i}\mathbf{z}_{i},\Theta,\eta) & = & \sum_{k=1}^{V_{o}}c_{o}^{k} \log p(\mathbf{y}_{i}^{k} {\rm pa}(\mathbf{y}_{i}^{k},\Theta,\eta)) \\ \mathbf{c}_{g}KL\left[q(\Theta)p(\Theta)\right] & = & \sum_{k=1}^{V_{g}}c_{g}^{k}KL\left[ q(\theta^{k})p(\theta^{k}{\rm pa(\theta^{k})})\right] \\ \mathbf{c}_{l}KL\left[q(\mathbf{z}_{i}p(\mathbf{z}_{i})\right] & = & \sum_{k=1}^{V_{l}}c_{l}^{k}KL\left[ q(\mathbf{z}_{i}^{k}) p(\mathbf{z}_{i}^{k}{\rm pa}(\mathbf{z}_{i}^{k}))\right],\end{split}\]where \({\rm pa}(v)\) denotes the set of parent variables of \(v\) in the directed acyclic graph of the model.
When using minibatches, \(c_{o}^{k}\) and \(c_{l}^{k}\) should be set to \(N/M\), where \(M\) is the number of observations in each minibatch. This is done with supplying total_size parameter to observed nodes (e.g.
Normal('x', 0, 1, observed=data, total_size=10000)
). In this case it is possible to automatically determine appropriate scaling for \(logp\) of observed nodes. Interesting to note that it is possible to have two independent observed variables with different total_size and iterate them independently during inference.For working with ADVI, we need to give
The probabilistic model
model with three types of RVs (observed_RVs, global_RVs and local_RVs).
(optional) Minibatches
The tensors to which minibathced samples are supplied are handled separately by using callbacks in
Inference.fit()
method that change storage of shared theano variable or bypymc3.generator()
that automatically iterates over minibatches and defined beforehand.(optional) Parameters of deterministic mappings
They have to be passed along with other params to
Inference.fit()
method as more_obj_params argument.
For more information concerning training stage please reference
pymc3.variational.opvi.ObjectiveFunction.step_function()
 Parameters
 local_rv: dict[var>tuple]
mapping {model_variable > approx params} Local Vars are used for Autoencoding Variational Bayes See (AEVB; Kingma and Welling, 2014) for details
 model: :class:`pymc3.Model`
PyMC3 model for inference
 random_seed: None or int
leave None to use package global RandomStream or other valid value to create instance specific one
 start: `Point`
starting point for inference
References
Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., and Blei, D. M. (2016). Automatic Differentiation Variational Inference. arXiv preprint arXiv:1603.00788.
Geoffrey Roeder, Yuhuai Wu, David Duvenaud, 2016 Sticking the Landing: A Simple ReducedVariance Gradient for ADVI approximateinference.org/accepted/RoederEtAl2016.pdf
Kingma, D. P., & Welling, M. (2014). AutoEncoding Variational Bayes. stat, 1050, 1.

class
pymc3.variational.inference.
ASVGD
(approx=None, estimator=<class 'pymc3.variational.operators.KSD'>, kernel=<pymc3.variational.test_functions.RBF object>, **kwargs)¶ Amortized Stein Variational Gradient Descent
not suggested to use
This inference is based on Kernelized Stein Discrepancy it’s main idea is to move initial noisy particles so that they fit target distribution best.
Algorithm is outlined below
Input: Parametrized random generator \(R_{\theta}\)
Output: \(R_{\theta^{*}}\) that approximates the target distribution.
\[\begin{split}\Delta x_i &= \hat{\phi}^{*}(x_i) \\ \hat{\phi}^{*}(x) &= \frac{1}{n}\sum^{n}_{j=1}[k(x_j,x) \nabla_{x_j} logp(x_j)+ \nabla_{x_j} k(x_j,x)] \\ \Delta_{\theta} &= \frac{1}{n}\sum^{n}_{i=1}\Delta x_i\frac{\partial x_i}{\partial \theta}\end{split}\] Parameters
 approx: :class:`Approximation`
default is
FullRank
but can be any kernel: `callable`
kernel function for KSD \(f(histogram) > (k(x,.), \nabla_x k(x,.))\)
 model: :class:`Model`
 kwargs: kwargs for gradient estimator
References
Dilin Wang, Yihao Feng, Qiang Liu (2016) Learning to Sample Using Stein Discrepancy http://bayesiandeeplearning.org/papers/BDL_21.pdf
Dilin Wang, Qiang Liu (2016) Learning to Draw Samples: With Application to Amortized MLE for Generative Adversarial Learning arXiv:1611.01722
Yang Liu, Prajit Ramachandran, Qiang Liu, Jian Peng (2017) Stein Variational Policy Gradient arXiv:1704.02399

fit
(n=10000, score=None, callbacks=None, progressbar=True, obj_n_mc=500, **kwargs)¶ Perform Operator Variational Inference
 Parameters
 n: int
number of iterations
 score: bool
evaluate loss on each iteration or not
 callbacks: list[function: (Approximation, losses, i) > None]
calls provided functions after each iteration step
 progressbar: bool
whether to show progressbar or not
 Returns
Approximation
 Other Parameters
 obj_n_mc: `int`
Number of monte carlo samples used for approximation of objective gradients
 tf_n_mc: `int`
Number of monte carlo samples used for approximation of test function gradients
 obj_optimizer: function (grads, params) > updates
Optimizer that is used for objective params
 test_optimizer: function (grads, params) > updates
Optimizer that is used for test function params
 more_obj_params: `list`
Add custom params for objective optimizer
 more_tf_params: `list`
Add custom params for test function optimizer
 more_updates: `dict`
Add custom updates to resulting updates
 total_grad_norm_constraint: `float`
Bounds gradient norm, prevents exploding gradient problem
 fn_kwargs: `dict`
Add kwargs to theano.function (e.g. {‘profile’: True})
 more_replacements: `dict`
Apply custom replacements before calculating gradients

class
pymc3.variational.inference.
FullRankADVI
(*args, **kwargs)¶ Full Rank Automatic Differentiation Variational Inference (ADVI)
 Parameters
 local_rv: dict[var>tuple]
mapping {model_variable > approx params} Local Vars are used for Autoencoding Variational Bayes See (AEVB; Kingma and Welling, 2014) for details
 model: :class:`pymc3.Model`
PyMC3 model for inference
 random_seed: None or int
leave None to use package global RandomStream or other valid value to create instance specific one
 start: `Point`
starting point for inference
References
Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., and Blei, D. M. (2016). Automatic Differentiation Variational Inference. arXiv preprint arXiv:1603.00788.
Geoffrey Roeder, Yuhuai Wu, David Duvenaud, 2016 Sticking the Landing: A Simple ReducedVariance Gradient for ADVI approximateinference.org/accepted/RoederEtAl2016.pdf
Kingma, D. P., & Welling, M. (2014). AutoEncoding Variational Bayes. stat, 1050, 1.

class
pymc3.variational.inference.
ImplicitGradient
(approx, estimator=<class 'pymc3.variational.operators.KSD'>, kernel=<pymc3.variational.test_functions.RBF object>, **kwargs)¶ Implicit Gradient for Variational Inference
not suggested to use
An approach to fit arbitrary approximation by computing kernel based gradient By default RBF kernel is used for gradient estimation. Default estimator is Kernelized Stein Discrepancy with temperature equal to 1. This temperature works only for large number of samples. Larger temperature is needed for small number of samples but there is no theoretical approach to choose the best one in such case.

class
pymc3.variational.inference.
Inference
(op, approx, tf, **kwargs)¶ Base class for Variational Inference
Communicates Operator, Approximation and Test Function to build Objective Function
 Parameters
 op: Operator class
 approx: Approximation class or instance
 tf: TestFunction instance
 model: Model
PyMC3 Model
 kwargs: kwargs passed to :class:`Operator`

fit
(n=10000, score=None, callbacks=None, progressbar=True, **kwargs)¶ Perform Operator Variational Inference
 Parameters
 n: int
number of iterations
 score: bool
evaluate loss on each iteration or not
 callbacks: list[function: (Approximation, losses, i) > None]
calls provided functions after each iteration step
 progressbar: bool
whether to show progressbar or not
 Returns
Approximation
 Other Parameters
 obj_n_mc: `int`
Number of monte carlo samples used for approximation of objective gradients
 tf_n_mc: `int`
Number of monte carlo samples used for approximation of test function gradients
 obj_optimizer: function (grads, params) > updates
Optimizer that is used for objective params
 test_optimizer: function (grads, params) > updates
Optimizer that is used for test function params
 more_obj_params: `list`
Add custom params for objective optimizer
 more_tf_params: `list`
Add custom params for test function optimizer
 more_updates: `dict`
Add custom updates to resulting updates
 total_grad_norm_constraint: `float`
Bounds gradient norm, prevents exploding gradient problem
 fn_kwargs: `dict`
Add kwargs to theano.function (e.g. {‘profile’: True})
 more_replacements: `dict`
Apply custom replacements before calculating gradients

refine
(n, progressbar=True)¶ Refine the solution using the last compiled step function

class
pymc3.variational.inference.
KLqp
(approx, beta=1.0)¶ Kullback Leibler Divergence Inference
General approach to fit Approximations that define \(logq\) by maximizing ELBO (Evidence Lower Bound). In some cases rescaling the regularization term KL may be beneficial
\[ELBO_\beta = \log p(D\theta)  \beta KL(qp)\] Parameters
 approx: :class:`Approximation`
Approximation to fit, it is required to have logQ
 beta: float
Scales the regularization term in ELBO (see Christopher P. Burgess et al., 2017)
References
Christopher P. Burgess et al. (NIPS, 2017) Understanding disentangling in \(\beta\)VAE arXiv preprint 1804.03599

class
pymc3.variational.inference.
NFVI
(*args, **kwargs)¶ Normalizing Flow based :class:`KLqp` inference
Normalizing flow is a series of invertible transformations on initial distribution.
\[z_K = f_K \circ \dots \circ f_2 \circ f_1(z_0)\]In that case we can compute tractable density for the flow.
\[\ln q_K(z_K) = \ln q_0(z_0)  \sum_{k=1}^{K}\ln \left\frac{\partial f_k}{\partial z_{k1}}\right\]Every \(f_k\) here is a parametric function with defined determinant. We can choose every step here. For example the here is a simple flow is an affine transform:
\[z = loc(scale(z_0)) = \mu + \sigma * z_0\]Here we get mean field approximation if \(z_0 \sim \mathcal{N}(0, 1)\)
Flow Formulas
In PyMC3 there is a flexible way to define flows with formulas. We have 5 of them by the moment:
Loc (
loc
): \(z' = z + \mu\)Scale (
scale
): \(z' = \sigma * z\)Planar (
planar
): \(z' = z + u * \tanh(w^T z + b)\)Radial (
radial
): \(z' = z + \beta (\alpha + (zz_r))^{1}(zz_r)\)Householder (
hh
): \(z' = H z\)
Formula can be written as a string, e.g. ‘scaleloc’, ‘scalehh*4loc’, ‘panar*10’. Every step is separated with ‘‘, repeated flow is marked with ‘*’ producing ‘flow*repeats’.
 Parameters
 flow: strAbstractFlow
formula or initialized Flow, default is ‘scaleloc’ that is identical to MeanField
 model: :class:`pymc3.Model`
PyMC3 model for inference
 random_seed: None or int
leave None to use package global RandomStream or other valid value to create instance specific one

class
pymc3.variational.inference.
SVGD
(n_particles=100, jitter=1, model=None, start=None, random_seed=None, estimator=<class 'pymc3.variational.operators.KSD'>, kernel=<pymc3.variational.test_functions.RBF object>, **kwargs)¶ Stein Variational Gradient Descent
This inference is based on Kernelized Stein Discrepancy it’s main idea is to move initial noisy particles so that they fit target distribution best.
Algorithm is outlined below
 Input: A target distribution with density function \(p(x)\)
and a set of initial particles \(\{x^0_i\}^n_{i=1}\)
Output: A set of particles \(\{x^{*}_i\}^n_{i=1}\) that approximates the target distribution.
\[\begin{split}x_i^{l+1} &\leftarrow x_i^{l} + \epsilon_l \hat{\phi}^{*}(x_i^l) \\ \hat{\phi}^{*}(x) &= \frac{1}{n}\sum^{n}_{j=1}[k(x^l_j,x) \nabla_{x^l_j} logp(x^l_j)+ \nabla_{x^l_j} k(x^l_j,x)]\end{split}\] Parameters
 n_particles: `int`
number of particles to use for approximation
 jitter: `float`
noise sd for initial point
 model: :class:`pymc3.Model`
PyMC3 model for inference
 kernel: `callable`
kernel function for KSD \(f(histogram) > (k(x,.), \nabla_x k(x,.))\)
 temperature: float
parameter responsible for exploration, higher temperature gives more broad posterior estimate
 start: `dict`
initial point for inference
 random_seed: None or int
leave None to use package global RandomStream or other valid value to create instance specific one
 start: `Point`
starting point for inference
 kwargs: other keyword arguments passed to estimator
References
Qiang Liu, Dilin Wang (2016) Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm arXiv:1608.04471
Yang Liu, Prajit Ramachandran, Qiang Liu, Jian Peng (2017) Stein Variational Policy Gradient arXiv:1704.02399

pymc3.variational.inference.
fit
(n=10000, local_rv=None, method='advi', model=None, random_seed=None, start=None, inf_kwargs=None, **kwargs)¶ Handy shortcut for using inference methods in functional way
 Parameters
 n: `int`
number of iterations
 local_rv: dict[var>tuple]
mapping {model_variable > approx params} Local Vars are used for Autoencoding Variational Bayes See (AEVB; Kingma and Welling, 2014) for details
 method: str or :class:`Inference`
string name is case insensitive in:
‘advi’ for ADVI
‘fullrank_advi’ for FullRankADVI
‘svgd’ for Stein Variational Gradient Descent
‘asvgd’ for Amortized Stein Variational Gradient Descent
‘nfvi’ for Normalizing Flow with default scaleloc flow
‘nfvi=<formula>’ for Normalizing Flow using formula
 model: :class:`Model`
PyMC3 model for inference
 random_seed: None or int
leave None to use package global RandomStream or other valid value to create instance specific one
 inf_kwargs: dict
additional kwargs passed to
Inference
 start: `Point`
starting point for inference
 Returns
Approximation
 Other Parameters
 score: bool
evaluate loss on each iteration or not
 callbacks: list[function: (Approximation, losses, i) > None]
calls provided functions after each iteration step
 progressbar: bool
whether to show progressbar or not
 obj_n_mc: `int`
Number of monte carlo samples used for approximation of objective gradients
 tf_n_mc: `int`
Number of monte carlo samples used for approximation of test function gradients
 obj_optimizer: function (grads, params) > updates
Optimizer that is used for objective params
 test_optimizer: function (grads, params) > updates
Optimizer that is used for test function params
 more_obj_params: `list`
Add custom params for objective optimizer
 more_tf_params: `list`
Add custom params for test function optimizer
 more_updates: `dict`
Add custom updates to resulting updates
 total_grad_norm_constraint: `float`
Bounds gradient norm, prevents exploding gradient problem
 fn_kwargs: `dict`
Add kwargs to theano.function (e.g. {‘profile’: True})
 more_replacements: `dict`
Apply custom replacements before calculating gradients
Approximations¶

class
pymc3.variational.approximations.
Empirical
(trace=None, size=None, **kwargs)¶ Single Group Full Rank Approximation
Builds Approximation instance from a given trace, it has the same interface as variational approximation

evaluate_over_trace
(node)¶ This allows to statically evaluate any symbolic expression over the trace.
 Parameters
 node: Theano Variables (or Theano expressions)
 Returns
 evaluated node(s) over the posterior trace contained in the empirical approximation


class
pymc3.variational.approximations.
FullRank
(*args, **kwargs)¶ Single Group Full Rank Approximation
Full Rank approximation to the posterior where Multivariate Gaussian family is fitted to minimize KL divergence from True posterior. In contrast to MeanField approach correlations between variables are taken in account. The main drawback of the method is computational cost.

class
pymc3.variational.approximations.
MeanField
(*args, **kwargs)¶ Single Group Mean Field Approximation
Mean Field approximation to the posterior where spherical Gaussian family is fitted to minimize KL divergence from True posterior. It is assumed that latent space variables are uncorrelated that is the main drawback of the method

class
pymc3.variational.approximations.
NormalizingFlow
(flow='scaleloc', *args, **kwargs)¶ Single Group Normalizing Flow Approximation
Normalizing flow is a series of invertible transformations on initial distribution.
\[\begin{split}z_K &= f_K \circ \dots \circ f_2 \circ f_1(z_0) \\ & z_0 \sim \mathcal{N}(0, 1)\end{split}\]In that case we can compute tractable density for the flow.
\[\ln q_K(z_K) = \ln q_0(z_0)  \sum_{k=1}^{K}\ln \left\frac{\partial f_k}{\partial z_{k1}}\right\]Every \(f_k\) here is a parametric function with defined determinant. We can choose every step here. For example the here is a simple flow is an affine transform:
\[z = loc(scale(z_0)) = \mu + \sigma * z_0\]Here we get mean field approximation if \(z_0 \sim \mathcal{N}(0, 1)\)
Flow Formulas
In PyMC3 there is a flexible way to define flows with formulas. We have 5 of them by the moment:
Loc (
loc
): \(z' = z + \mu\)Scale (
scale
): \(z' = \sigma * z\)Planar (
planar
): \(z' = z + u * \tanh(w^T z + b)\)Radial (
radial
): \(z' = z + \beta (\alpha + (zz_r))^{1}(zz_r)\)Householder (
hh
): \(z' = H z\)
Formula can be written as a string, e.g. ‘scaleloc’, ‘scalehh*4loc’, ‘panar*10’. Every step is separated with ‘‘, repeated flow is marked with ‘*’ producing ‘flow*repeats’.
References
Danilo Jimenez Rezende, Shakir Mohamed, 2015 Variational Inference with Normalizing Flows arXiv:1505.05770
Jakub M. Tomczak, Max Welling, 2016 Improving Variational AutoEncoders using Householder Flow arXiv:1611.09630

pymc3.variational.approximations.
sample_approx
(approx, draws=100, include_transformed=True)¶ Draw samples from variational posterior.
 Parameters
 approx: :class:`Approximation`
Approximation to sample from
 draws: `int`
Number of random samples.
 include_transformed: `bool`
If True, transformed variables are also sampled. Default is True.
 Returns
 trace: class:pymc3.backends.base.MultiTrace
Samples drawn from variational posterior.
Operators¶

class
pymc3.variational.operators.
KL
(approx, beta=1.0)¶ Operator based on Kullback Leibler Divergence
This operator constructs Evidence Lower Bound (ELBO) objective
\[ELBO_\beta = \log p(D\theta)  \beta KL(qp)\]where
\[KL[q(v)p(v)] = \int q(v)\log\frac{q(v)}{p(v)}dv\] Parameters
 approx: :class:`Approximation`
Approximation used for inference
 beta: float
Beta parameter for KL divergence, scales the regularization term.

apply
(f)¶ Operator itself
\[(O^{p,q}f_{\theta})(z)\] Parameters
 f: :class:`TestFunction` or None
function that takes z = self.input and returns same dimensional output
 Returns
 TensorVariable
symbolically applied operator

class
pymc3.variational.operators.
KSD
(approx, temperature=1)¶ Operator based on Kernelized Stein Discrepancy
 Input: A target distribution with density function \(p(x)\)
and a set of initial particles \(\{x^0_i\}^n_{i=1}\)
Output: A set of particles \(\{x_i\}^n_{i=1}\) that approximates the target distribution.
\[\begin{split}x_i^{l+1} \leftarrow \epsilon_l \hat{\phi}^{*}(x_i^l) \\ \hat{\phi}^{*}(x) = \frac{1}{n}\sum^{n}_{j=1}[k(x^l_j,x) \nabla_{x^l_j} logp(x^l_j)/temp + \nabla_{x^l_j} k(x^l_j,x)]\end{split}\] Parameters
 approx: :class:`Approximation`
Approximation used for inference
 temperature: float
Temperature for Stein gradient
References
Qiang Liu, Dilin Wang (2016) Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm arXiv:1608.04471

apply
(f)¶ Operator itself
\[(O^{p,q}f_{\theta})(z)\] Parameters
 f: :class:`TestFunction` or None
function that takes z = self.input and returns same dimensional output
 Returns
 TensorVariable
symbolically applied operator

objective_class
¶ alias of
pymc3.variational.operators.KSDObjective