# Variational API quickstart¶

The variational inference (VI) API is focused on approximating posterior distributions for Bayesian models. Common use cases to which this module can be applied include:

• Sampling from model posterior and computing arbitrary expressions

• Conduct Monte Carlo approximation of expectation, variance, and other statistics

• Remove symbolic dependence on PyMC3 random nodes and evaluate expressions (using eval)

• Provide a bridge to arbitrary Theano code

Sounds good, doesn’t it?

The module provides an interface to a variety of inference methods, so you are free to choose what is most appropriate for the problem.

:

%matplotlib inline
import matplotlib.pyplot as plt
import pymc3 as pm
import theano
import numpy as np

np.random.seed(42)
pm.set_tt_rng(42)


## Basic setup¶

We do not need complex models to play with the VI API; let’s begin with a simple mixture model:

:

w = pm.floatX([.2, .8])
mu = pm.floatX([-.3, .5])
sd = pm.floatX([.1, .1])

with pm.Model() as model:
x = pm.NormalMixture('x', w=w, mu=mu, sigma=sd, dtype=theano.config.floatX)
x2 = x ** 2
sin_x = pm.math.sin(x)


We can’t compute analytical expectations for this model. However, we can obtain an approximation using Markov chain Monte Carlo methods; let’s use NUTS first.

To allow samples of the expressions to be saved, we need to wrap them in Deterministic objects:

:

with model:
pm.Deterministic('x2', x2)
pm.Deterministic('sin_x', sin_x)

:

with model:
trace = pm.sample(50000)

Auto-assigning NUTS sampler...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [x]
Sampling 2 chains: 100%|██████████| 101000/101000 [00:41<00:00, 2455.71draws/s]
/Users/twiecki/anaconda3/lib/python3.6/site-packages/mkl_fft/_numpy_fft.py:1044: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
output = mkl_fft.rfftn_numpy(a, s, axes)
The estimated number of effective samples is smaller than 200 for some parameters.

:

pm.traceplot(trace); Above are traces for $$x^2$$ and $$sin(x)$$. We can see there is clear multi-modality in this model. One drawback, is that you need to know in advance what exactly you want to see in trace and wrap it with Deterministic.

The VI API takes an alternate approach: You obtain inference from model, then calculate expressions based on this model afterwards.

Let’s use the same model:

:

with pm.Model() as model:

x = pm.NormalMixture('x', w=w, mu=mu, sigma=sd, dtype=theano.config.floatX)
x2 = x ** 2
sin_x = pm.math.sin(x)


Here we will use automatic differentiation variational inference (ADVI).

:

with model:

Average Loss = 2.2413: 100%|██████████| 10000/10000 [00:06<00:00, 1454.10it/s]
Finished [100%]: Average Loss = 2.2687

:

pm.plot_posterior(mean_field.sample(1000), color='LightSeaGreen'); Notice that ADVI has failed to approximate the multimodal distribution, since it uses a Gaussian distribution that has a single mode.

## Checking convergence¶

:

help(pm.callbacks.CheckParametersConvergence)

Help on class CheckParametersConvergence in module pymc3.variational.callbacks:

class CheckParametersConvergence(Callback)
|  Convergence stopping check
|
|  Parameters
|  ----------
|  every : int
|      check frequency
|  tolerance : float
|      if diff norm < tolerance : break
|  diff : str
|      difference type one of {'absolute', 'relative'}
|  ord : {non-zero int, inf, -inf, 'fro', 'nuc'}, optional
|      see more info in :func:numpy.linalg.norm
|
|  Examples
|  --------
|  >>> with model:
|  ...     approx = pm.fit(
|  ...         n=10000, callbacks=[
|  ...             CheckParametersConvergence(
|  ...                 every=50, diff='absolute',
|  ...                 tolerance=1e-4)
|  ...         ]
|  ...     )
|
|  Method resolution order:
|      CheckParametersConvergence
|      Callback
|      builtins.object
|
|  Methods defined here:
|
|  __call__(self, approx, _, i)
|      Call self as a function.
|
|  __init__(self, every=100, tolerance=0.001, diff='relative', ord=inf)
|      Initialize self.  See help(type(self)) for accurate signature.
|
|  ----------------------------------------------------------------------
|  Static methods defined here:
|
|  flatten_shared(shared_list)
|
|  ----------------------------------------------------------------------
|  Data descriptors inherited from Callback:
|
|  __dict__
|      dictionary for instance variables (if defined)
|
|  __weakref__
|      list of weak references to the object (if defined)



Let’s use the default arguments for CheckParametersConvergence as they seem to be reasonable.

:

from pymc3.variational.callbacks import CheckParametersConvergence

with model:

Average Loss = 2.2559: 100%|██████████| 10000/10000 [00:06<00:00, 1464.59it/s]
Finished [100%]: Average Loss = 2.2763


We can access inference history via .hist attribute.

:

plt.plot(mean_field.hist); This is not a good convergence plot, despite the fact that we ran many iterations. The reason is that the mean of the ADVI approximation is close to zero, and therefore taking the relative difference (the default method) is unstable for checking convergence.

:

with model:

Average Loss = 3.145:  45%|████▌     | 4531/10000 [00:03<00:04, 1360.34it/s]
Convergence achieved at 4700
Interrupted at 4,699 [46%]: Average Loss = 4.7996

:

plt.plot(mean_field.hist); That’s much better! We’ve reached convergence after less than 5000 iterations.

## Tracking parameters¶

Another usefull callback allows users to track parameters. It allows for the tracking of arbitrary statistics during inference, though it can be memory-hungry. Using the fit function, we do not have direct access to the approximation before inference. However, tracking parameters requires access to the approximation. We can get around this constraint by using the object-oriented (OO) API for inference.

:

with model:

:

advi.approx

:

<pymc3.variational.approximations.MeanField at 0x1c28a5ebe0>


Different approximations have different hyperparameters. In mean-field ADVI, we have $$\rho$$ and $$\mu$$ (inspired by Bayes by BackProp).

:

advi.approx.shared_params

:

{'mu': mu, 'rho': rho}


There are convenient shortcuts to relevant statistics associated with the approximation. This can be useful, for example, when specifying a mass matrix for NUTS sampling:

:

advi.approx.mean.eval(), advi.approx.std.eval()

:

(array([0.34]), array([0.69314718]))


We can roll these statistics into the Tracker callback.

:

tracker = pm.callbacks.Tracker(
mean=advi.approx.mean.eval,  # callable that returns mean
std=advi.approx.std.eval  # callable that returns std
)


Now, calling advi.fit will record the mean and standard deviation of the approximation as it runs.

:

approx = advi.fit(20000, callbacks=[tracker])

Average Loss = 1.9568: 100%|██████████| 20000/20000 [00:14<00:00, 1397.94it/s]
Finished [100%]: Average Loss = 1.9589


We can now plot both the evidence lower bound and parameter traces:

:

fig = plt.figure(figsize=(16, 9))
mu_ax.plot(tracker['mean'])
mu_ax.set_title('Mean track')
std_ax.plot(tracker['std'])
std_ax.set_title('Std track')
hist_ax.set_title('Negative ELBO track'); Notice that there are convergence issues with the mean, and that lack of convergence does not seem to change the ELBO trajectory significantly. As we are using the OO API, we can run the approximation longer until convergence is achieved.

:

advi.refine(100000)

Average Loss = 1.8638: 100%|██████████| 100000/100000 [01:14<00:00, 1342.20it/s]
Finished [100%]: Average Loss = 1.8422


Let’s take a look:

:

fig = plt.figure(figsize=(16, 9))
mu_ax.plot(tracker['mean'])
mu_ax.set_title('Mean track')
std_ax.plot(tracker['std'])
std_ax.set_title('Std track')
hist_ax.set_title('Negative ELBO track'); We still see evidence for lack of convergence, as the mean has devolved into a random walk. This could be the result of choosing a poor algorithm for inference. At any rate, it is unstable and can produce very different results even using different random seeds.

Let’s compare results with the NUTS output:

:

import seaborn as sns
ax = sns.kdeplot(trace['x'], label='NUTS');

/Users/twiecki/anaconda3/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval Again, we see that ADVI is not able to cope with multimodality; we can instead use SVGD, which generates an approximation based on a large number of particles.

:

with model:
svgd_approx = pm.fit(300, method='svgd', inf_kwargs=dict(n_particles=1000),
obj_optimizer=pm.sgd(learning_rate=0.01))

/Users/twiecki/anaconda3/lib/python3.6/site-packages/theano/scan_module/scan_perform_ext.py:76: UserWarning: The file scan_perform.c is not available. This donot happen normally. You are probably in a strangesetup. This mean Theano can not use the cython code for scan. If youwant to remove this warning, use the Theano flag'cxx=' (set to an empty string) to disable all ccode generation.
"The file scan_perform.c is not available. This do"
100%|██████████| 300/300 [00:51<00:00,  5.81it/s]

:

ax = sns.kdeplot(trace['x'], label='NUTS');
sns.kdeplot(svgd_approx.sample(2000)['x'], label='SVGD');

/Users/twiecki/anaconda3/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
/Users/twiecki/anaconda3/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval That did the trick, as we now have a multimodal approximation using SVGD.

With this, it is possible to calculate arbitrary functions of the parameters with this variational approximation. For example we can calculate $$x^2$$ and $$sin(x)$$, as with the NUTS model.

:

# recall x ~ NormalMixture
a = x**2
b = pm.math.sin(x)


To evaluate these expressions with the approximation, we need approx.sample_node.

:

help(svgd_approx.sample_node)

Help on method sample_node in module pymc3.variational.opvi:

sample_node(node, size=None, deterministic=False, more_replacements=None) method of pymc3.variational.approximations.Empirical instance
Samples given node or nodes over shared posterior

Parameters
----------
node : Theano Variables (or Theano expressions)
size : None or scalar
number of samples
more_replacements : dict
add custom replacements to graph, e.g. change input source
deterministic : bool
whether to use zeros as initial distribution
if True - zero initial point will produce constant latent variables

Returns
-------
sampled node(s) with replacements


:

a_sample = svgd_approx.sample_node(a)
a_sample.eval()

:

array(0.20617133)

:

a_sample.eval()

:

array(0.23059109)

:

a_sample.eval()

:

array(0.01689826)


Every call yields a different value from the same theano node. This is because it is stochastic.

By applying replacements, we are now free of the dependence on the PyMC3 model; instead, we now depend on the approximation. Changing it will change the distribution for stochastic nodes:

:

sns.kdeplot(np.array([a_sample.eval() for _ in range(2000)]));
plt.title('$x^2$ distribution');

/Users/twiecki/anaconda3/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval There is a more convinient way to get lots of samples at once: sample_node

:

a_samples = svgd_approx.sample_node(a, size=1000)

:

sns.kdeplot(a_samples.eval());
plt.title('$x^2$ distribution');

/Users/twiecki/anaconda3/lib/python3.6/site-packages/theano/scan_module/scan_perform_ext.py:76: UserWarning: The file scan_perform.c is not available. This donot happen normally. You are probably in a strangesetup. This mean Theano can not use the cython code for scan. If youwant to remove this warning, use the Theano flag'cxx=' (set to an empty string) to disable all ccode generation.
"The file scan_perform.c is not available. This do"
/Users/twiecki/anaconda3/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval The sample_node function includes an additional dimension, so taking expectations or calculating variance is specified by axis=0.

:

a_samples.var(0).eval()  # variance

/Users/twiecki/anaconda3/lib/python3.6/site-packages/theano/scan_module/scan_perform_ext.py:76: UserWarning: The file scan_perform.c is not available. This donot happen normally. You are probably in a strangesetup. This mean Theano can not use the cython code for scan. If youwant to remove this warning, use the Theano flag'cxx=' (set to an empty string) to disable all ccode generation.
"The file scan_perform.c is not available. This do"

:

array(0.0963961)

:

a_samples.mean(0).eval()  # mean

/Users/twiecki/anaconda3/lib/python3.6/site-packages/theano/scan_module/scan_perform_ext.py:76: UserWarning: The file scan_perform.c is not available. This donot happen normally. You are probably in a strangesetup. This mean Theano can not use the cython code for scan. If youwant to remove this warning, use the Theano flag'cxx=' (set to an empty string) to disable all ccode generation.
"The file scan_perform.c is not available. This do"

:

array(0.24696937)


A symbolic sample size can also be specified:

:

i = theano.tensor.iscalar('i')
i.tag.test_value = 1
a_samples_i = svgd_approx.sample_node(a, size=i)

:

a_samples_i.eval({i: 100}).shape

/Users/twiecki/anaconda3/lib/python3.6/site-packages/theano/scan_module/scan_perform_ext.py:76: UserWarning: The file scan_perform.c is not available. This donot happen normally. You are probably in a strangesetup. This mean Theano can not use the cython code for scan. If youwant to remove this warning, use the Theano flag'cxx=' (set to an empty string) to disable all ccode generation.
"The file scan_perform.c is not available. This do"

:

(100,)

:

a_samples_i.eval({i: 10000}).shape

:

(10000,)


Unfortunately the size must be a scalar value.

### Converting a Trace to an Approximation¶

We can convert a MCMC trace into an Approximation. It will have the same API as approximations above with same sample_node methods:

:

trace_approx = pm.Empirical(trace, model=model)
trace_approx

:

<pymc3.variational.approximations.Empirical at 0x1c2e0d1f60>


We can then draw samples from the Emipirical object:

:

pm.plot_posterior(trace_approx.sample(10000)); ## Multilabel logistic regression¶

Let’s illustrate the use of Tracker with the famous Iris dataset. We’ll attempy multi-label classification and compute the expected accuracy score as a diagnostic.

:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import theano.tensor as tt
import pandas as pd

X_train, X_test, y_train, y_test = train_test_split(X, y) A relatively simple model will be sufficient here because the classes are roughly linearly separable; we are going to fit multinomial logistic regression.

:

Xt = theano.shared(X_train)
yt = theano.shared(y_train)

with pm.Model() as iris_model:

# Coefficients for features
β = pm.Normal('β', 0, sigma=1e2, shape=(4, 3))
# Transoform to unit interval
a = pm.Flat('a', shape=(3,))
p = tt.nnet.softmax(Xt.dot(β) + a)

observed = pm.Categorical('obs', p=p, observed=yt)


### Applying replacements in practice¶

PyMC3 models have symbolic inputs for latent variables. To evaluate an espression that requires knowledge of latent variables, one needs to provide fixed values. We can use values approximated by VI for this purpose. The function sample_node removes the symbolic dependenices.

sample_node will use the whole distribution at each step, so we will use it here. We can apply more replacements in single function call using the more_replacements keyword argument in both replacement functions.

HINT: You can use more_replacements argument when calling fit too: * pm.fit(more_replacements={full_data: minibatch_data}) * inference.fit(more_replacements={full_data: minibatch_data})

:

with iris_model:

# We'll use SVGD
inference = pm.SVGD(n_particles=500, jitter=1)

# Local reference to approximation
approx = inference.approx

# Here we need more_replacements to change train_set to test_set
test_probs = approx.sample_node(p, more_replacements={Xt: X_test}, size=100)

# For train set no more replacements needed
train_probs = approx.sample_node(p)


By applying the code above, we now have 100 sampled probabilities (default number for sample_node is None) for each observation.

Next we create symbolic expressions for sampled accuracy scores:

:

test_ok = tt.eq(test_probs.argmax(-1), y_test)
train_ok = tt.eq(train_probs.argmax(-1), y_train)
test_accuracy = test_ok.mean(-1)
train_accuracy = train_ok.mean(-1)


Tracker expects callables so we can pass .eval method of theano node that is function itself.

Calls to this function are cached so they can be reused.

:

eval_tracker = pm.callbacks.Tracker(
test_accuracy=test_accuracy.eval,
train_accuracy=train_accuracy.eval
)

:

inference.fit(100, callbacks=[eval_tracker]);

/Users/twiecki/anaconda3/lib/python3.6/site-packages/theano/scan_module/scan_perform_ext.py:76: UserWarning: The file scan_perform.c is not available. This donot happen normally. You are probably in a strangesetup. This mean Theano can not use the cython code for scan. If youwant to remove this warning, use the Theano flag'cxx=' (set to an empty string) to disable all ccode generation.
"The file scan_perform.c is not available. This do"
0%|          | 0/100 [00:00<?, ?it/s]/Users/twiecki/anaconda3/lib/python3.6/site-packages/theano/scan_module/scan_perform_ext.py:76: UserWarning: The file scan_perform.c is not available. This donot happen normally. You are probably in a strangesetup. This mean Theano can not use the cython code for scan. If youwant to remove this warning, use the Theano flag'cxx=' (set to an empty string) to disable all ccode generation.
"The file scan_perform.c is not available. This do"
100%|██████████| 100/100 [00:23<00:00,  4.27it/s]

:

import seaborn as sns

sns.tsplot(np.asarray(eval_tracker['test_accuracy']).T, color='red')
sns.tsplot(np.asarray(eval_tracker['train_accuracy']).T, color='blue')
plt.legend(['test_accuracy', 'train_accuracy'])
plt.title('Training Progress')

/Users/twiecki/anaconda3/lib/python3.6/site-packages/seaborn/timeseries.py:183: UserWarning: The tsplot function is deprecated and will be removed in a future release. Please update your code to use the new lineplot function.
warnings.warn(msg, UserWarning)
/Users/twiecki/anaconda3/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
/Users/twiecki/anaconda3/lib/python3.6/site-packages/seaborn/timeseries.py:183: UserWarning: The tsplot function is deprecated and will be removed in a future release. Please update your code to use the new lineplot function.
warnings.warn(msg, UserWarning)

:

Text(0.5, 1.0, 'Training Progress') Training does not seem to be working here. Let’s use a different optimizer and boost the learning rate.

:

inference.fit(400, obj_optimizer=pm.adamax(learning_rate=0.1), callbacks=[eval_tracker]);

/Users/twiecki/anaconda3/lib/python3.6/site-packages/theano/scan_module/scan_perform_ext.py:76: UserWarning: The file scan_perform.c is not available. This donot happen normally. You are probably in a strangesetup. This mean Theano can not use the cython code for scan. If youwant to remove this warning, use the Theano flag'cxx=' (set to an empty string) to disable all ccode generation.
"The file scan_perform.c is not available. This do"
100%|██████████| 400/400 [00:56<00:00,  7.05it/s]

:

sns.tsplot(np.asarray(eval_tracker['test_accuracy']).T, color='red', alpha=.5)
sns.tsplot(np.asarray(eval_tracker['train_accuracy']).T, color='blue', alpha=.5)
plt.legend(['test_accuracy', 'train_accuracy'])
plt.title('Training Progress');

/Users/twiecki/anaconda3/lib/python3.6/site-packages/seaborn/timeseries.py:183: UserWarning: The tsplot function is deprecated and will be removed in a future release. Please update your code to use the new lineplot function.
warnings.warn(msg, UserWarning)
/Users/twiecki/anaconda3/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
/Users/twiecki/anaconda3/lib/python3.6/site-packages/seaborn/timeseries.py:183: UserWarning: The tsplot function is deprecated and will be removed in a future release. Please update your code to use the new lineplot function.
warnings.warn(msg, UserWarning) This is much better!

So, Tracker allows us to monitor our approximation and choose good training schedule.

## Minibatches¶

When dealing with large datasets, using minibatch training can drastically speed up and improve approximation performance. Large datasets impose a hefty cost on the computation of gradients.

There is a nice API in pymc3 to handle these cases, which is avaliable through the pm.Minibatch class. The minibatch is just a highly specialized Theano tensor:

:

issubclass(pm.Minibatch, theano.tensor.TensorVariable)

:

True


To demonstrate, let’s simulate a large quantity of data:

:

# Raw values
data = np.random.rand(40000, 100)
# Scaled values
data *= np.random.randint(1, 10, size=(100,))
# Shifted values
data += np.random.rand(100) * 10


For comparison, let’s fit a model without minibatch processing:

:

with pm.Model() as model:
mu = pm.Flat('mu', shape=(100,))
sd = pm.HalfNormal('sd', shape=(100,))
lik = pm.Normal('lik', mu, sd, observed=data)


Just for fun, let’s create a custom special purpose callback to halt slow optimization. Here we define a callback that causes a hard stop when approximation runs too slowly:

:

def stop_after_10(approx, loss_history, i):
if (i > 0) and (i % 10) == 0:
raise StopIteration('I was slow, sorry')

:

with model:

Average Loss = 5.0926e+08:   0%|          | 9/10000 [00:01<26:26,  6.30it/s]
I was slow, sorry
Interrupted at 9 [0%]: Average Loss = 5.4299e+08


Inference is too slow, taking several seconds per iteration; fitting the approximation would have taken hours!

Now let’s use minibatches. At every iteration, we will draw 500 random values:

Remember to set total_size in observed

total_size is an important parameter that allows pymc3 to infer the right way of rescaling densities. If it is not set, you are likely to get completely wrong results. For more information please refer to the comprehensive documentation of pm.Minibatch.

:

X = pm.Minibatch(data, batch_size=500)

with pm.Model() as model:

mu = pm.Flat('mu', shape=(100,))
sd = pm.HalfNormal('sd', shape=(100,))
likelihood = pm.Normal('likelihood', mu, sd, observed=X, total_size=data.shape)

/Users/twiecki/working/projects/pymc/pymc3/data.py:244: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
self.shared = theano.shared(data[in_memory_slc])

:

with model:

Average Loss = 1.5474e+05: 100%|██████████| 10000/10000 [00:17<00:00, 584.87it/s]
Finished [100%]: Average Loss = 1.5467e+05

:

plt.plot(advifit.hist); Minibatch inference is dramatically faster. Multidimensional minibatches may be needed for some corner cases where you do matrix factorization or model is very wide.

Here is the docstring for Minibatch to illustrate how it can be customized.

:

print(pm.Minibatch.__doc__)

Multidimensional minibatch that is pure TensorVariable

Parameters
----------
data : :class:ndarray
initial data
batch_size : int or List[int|tuple(size, random_seed)]
batch size for inference, random seed is needed
for child random generators
dtype : str
cast data to specific type
change broadcastable pattern that defaults to (False, ) * ndim
name : str
name for tensor, defaults to "Minibatch"
random_seed : int
random seed that is used by default
update_shared_f : callable
returns :class:ndarray that will be carefully
stored to underlying shared variable
you can use it to change source of
minibatches programmatically
in_memory_size : int or List[int|slice|Ellipsis]
data size for storing in theano.shared

Attributes
----------
shared : shared tensor
Used for storing data
minibatch : minibatch tensor
Used for training

Notes
-----
Below is a common use case of Minibatch within the variational inference.
Importantly, we need to make PyMC3 "aware" of minibatch being used in inference.
Otherwise, we will get the wrong :math:logp for the model.
To do so, we need to pass the total_size parameter to the observed node, which correctly scales
the density of the model logp that is affected by Minibatch. See more in examples below.

Examples
--------
Consider we have data
>>> data = np.random.rand(100, 100)

if we want 1d slice of size 10 we do
>>> x = Minibatch(data, batch_size=10)

Note, that your data is cast to floatX if it is not integer type
But you still can add dtype kwarg for :class:Minibatch

in case we want 10 sampled rows and columns
[(size, seed), (size, seed)] it is
>>> x = Minibatch(data, batch_size=[(10, 42), (10, 42)], dtype='int32')
>>> assert str(x.dtype) == 'int32'

or simpler with default random seed = 42
[size, size]
>>> x = Minibatch(data, batch_size=[10, 10])

x is a regular :class:TensorVariable that supports any math
>>> assert x.eval().shape == (10, 10)

You can pass it to your desired model
>>> with pm.Model() as model:
...     mu = pm.Flat('mu')
...     sd = pm.HalfNormal('sd')
...     lik = pm.Normal('lik', mu, sd, observed=x, total_size=(100, 100))

Then you can perform regular Variational Inference out of the box
>>> with model:
...     approx = pm.fit()

Notable thing is that :class:Minibatch has shared, minibatch, attributes
you can call later
>>> x.set_value(np.random.laplace(size=(100, 100)))

and minibatches will be then from new storage
it directly affects x.shared.
the same thing would be but less convenient
>>> x.shared.set_value(pm.floatX(np.random.laplace(size=(100, 100))))

programmatic way to change storage is as follows
I import partial for simplicity
>>> from functools import partial
>>> datagen = partial(np.random.laplace, size=(100, 100))
>>> x = Minibatch(datagen(), batch_size=10, update_shared_f=datagen)
>>> x.update_shared()

To be more concrete about how we get minibatch, here is a demo
1) create shared variable
>>> shared = theano.shared(data)

2) create random slice of size 10
>>> ridx = pm.tt_rng().uniform(size=(10,), low=0, high=data.shape-1e-10).astype('int64')

3) take that slice
>>> minibatch = shared[ridx]

That's done. Next you can use this minibatch somewhere else.
You can see that implementation does not require fixed shape
for shared variable. Feel free to use that if needed.

Suppose you need some replacements in the graph, e.g. change minibatch to testdata
>>> node = x ** 2  # arbitrary expressions on minibatch x
>>> testdata = pm.floatX(np.random.laplace(size=(1000, 10)))

Then you should create a dict with replacements
>>> replacements = {x: testdata}
>>> rnode = theano.clone(node, replacements)
>>> assert (testdata ** 2 == rnode.eval()).all()

To replace minibatch with it's shared variable you should do
the same things. Minibatch variable is accessible as an attribute
as well as shared, associated with minibatch
>>> replacements = {x.minibatch: x.shared}
>>> rnode = theano.clone(node, replacements)

For more complex slices some more code is needed that can seem not so clear
>>> moredata = np.random.rand(10, 20, 30, 40, 50)

default total_size that can be passed to PyMC3 random node
is then (10, 20, 30, 40, 50) but can be less verbose in some cases

1) Advanced indexing, total_size = (10, Ellipsis, 50)
>>> x = Minibatch(moredata, [2, Ellipsis, 10])

We take slice only for the first and last dimension
>>> assert x.eval().shape == (2, 20, 30, 40, 10)

2) Skipping particular dimension, total_size = (10, None, 30)
>>> x = Minibatch(moredata, [2, None, 20])
>>> assert x.eval().shape == (2, 20, 20, 40, 50)

3) Mixing that all, total_size = (10, None, 30, Ellipsis, 50)
>>> x = Minibatch(moredata, [2, None, 20, Ellipsis, 10])
>>> assert x.eval().shape == (2, 20, 20, 40, 10)