Returns a BytesIO object for a package data file.

Parameters:filename (str) – file to load
Returns:BytesIO of the data

Helper class that helps to infer data type of generator with looking at the first item, preserving the order of the resulting generator

class, batch_size=128, dtype=None, broadcastable=None, name='Minibatch', random_seed=42, update_shared_f=None, in_memory_size=None)

Multidimensional minibatch that is pure TensorVariable

  • data (ndarray) – initial data
  • batch_size (int or List[int|tuple(size, random_seed)]) – batch size for inference, random seed is needed for child random generators
  • dtype (str) – cast data to specific type
  • broadcastable (tuple[bool]) – change broadcastable pattern that defaults to (False, ) * ndim
  • name (str) – name for tensor, defaults to “Minibatch”
  • random_seed (int) – random seed that is used by default
  • update_shared_f (callable) – returns ndarray that will be carefully stored to underlying shared variable you can use it to change source of minibatches programmatically
  • in_memory_size (int or List[int|slice|Ellipsis]) – data size for storing in theano.shared

Used for storing data

Type:shared tensor

Used for training

Type:minibatch tensor


Below is a common use case of Minibatch within the variational inference. Importantly, we need to make PyMC3 “aware” of minibatch being used in inference. Otherwise, we will get the wrong \(logp\) for the model. To do so, we need to pass the total_size parameter to the observed node, which correctly scales the density of the model logp that is affected by Minibatch. See more in examples below.


Consider we have data >>> data = np.random.rand(100, 100)

if we want 1d slice of size 10 we do >>> x = Minibatch(data, batch_size=10)

Note, that your data is cast to floatX if it is not integer type But you still can add dtype kwarg for Minibatch

in case we want 10 sampled rows and columns [(size, seed), (size, seed)] it is >>> x = Minibatch(data, batch_size=[(10, 42), (10, 42)], dtype=’int32’) >>> assert str(x.dtype) == ‘int32’

or simpler with default random seed = 42 [size, size] >>> x = Minibatch(data, batch_size=[10, 10])

x is a regular TensorVariable that supports any math >>> assert x.eval().shape == (10, 10)

You can pass it to your desired model >>> with pm.Model() as model: … mu = pm.Flat(‘mu’) … sd = pm.HalfNormal(‘sd’) … lik = pm.Normal(‘lik’, mu, sd, observed=x, total_size=(100, 100))

Then you can perform regular Variational Inference out of the box >>> with model: … approx =

Notable thing is that Minibatch has shared, minibatch, attributes you can call later >>> x.set_value(np.random.laplace(size=(100, 100)))

and minibatches will be then from new storage it directly affects x.shared. the same thing would be but less convenient >>> x.shared.set_value(pm.floatX(np.random.laplace(size=(100, 100))))

programmatic way to change storage is as follows I import partial for simplicity >>> from functools import partial >>> datagen = partial(np.random.laplace, size=(100, 100)) >>> x = Minibatch(datagen(), batch_size=10, update_shared_f=datagen) >>> x.update_shared()

To be more concrete about how we get minibatch, here is a demo 1) create shared variable >>> shared = theano.shared(data)

2) create random slice of size 10 >>> ridx = pm.tt_rng().uniform(size=(10,), low=0, high=data.shape[0]-1e-10).astype(‘int64’)

3) take that slice >>> minibatch = shared[ridx]

That’s done. Next you can use this minibatch somewhere else. You can see that implementation does not require fixed shape for shared variable. Feel free to use that if needed.

Suppose you need some replacements in the graph, e.g. change minibatch to testdata >>> node = x ** 2 # arbitrary expressions on minibatch x >>> testdata = pm.floatX(np.random.laplace(size=(1000, 10)))

Then you should create a dict with replacements >>> replacements = {x: testdata} >>> rnode = theano.clone(node, replacements) >>> assert (testdata ** 2 == rnode.eval()).all()

To replace minibatch with it’s shared variable you should do the same things. Minibatch variable is accessible as an attribute as well as shared, associated with minibatch >>> replacements = {x.minibatch: x.shared} >>> rnode = theano.clone(node, replacements)

For more complex slices some more code is needed that can seem not so clear >>> moredata = np.random.rand(10, 20, 30, 40, 50)

default total_size that can be passed to PyMC3 random node is then (10, 20, 30, 40, 50) but can be less verbose in some cases

1) Advanced indexing, total_size = (10, Ellipsis, 50) >>> x = Minibatch(moredata, [2, Ellipsis, 10])

We take slice only for the first and last dimension >>> assert x.eval().shape == (2, 20, 30, 40, 10)

2) Skipping particular dimension, total_size = (10, None, 30) >>> x = Minibatch(moredata, [2, None, 20]) >>> assert x.eval().shape == (2, 20, 20, 40, 50)

3) Mixing that all, total_size = (10, None, 30, Ellipsis, 50) >>> x = Minibatch(moredata, [2, None, 20, Ellipsis, 10]) >>> assert x.eval().shape == (2, 20, 20, 40, 10)


Return a new Variable like self.

Returns:Variable instance – A new Variable instance (or subclass instance) with no owner or index.


Tags are copied to the returned instance.

Name is copied to the returned instance.