# Backends¶

Backends for traces

## Available backends¶

1. NumPy array (pymc3.backends.NDArray)
2. Text files (pymc3.backends.Text)
3. SQLite (pymc3.backends.SQLite)

The NDArray backend holds the entire trace in memory, whereas the Text and SQLite backends store the values while sampling.

## Selecting a backend¶

By default, a NumPy array is used as the backend. To specify a different backend, pass a backend instance to sample.

For example, the following would save the sampling values to CSV files in the directory ‘test’.

>>> import pymc3 as pm
>>> with pm.Model():
>>>      db = pm.backends.Text('test')
>>>      trace = pm.sample(..., trace=db)


Note that as in the example above, one must have an active model context, or pass a model parameter in order to create a backend.

## Selecting values from a backend¶

After a backend is finished sampling, it returns a MultiTrace object. Values can be accessed in a few ways. The easiest way is to index the backend object with a variable or variable name.

>>> trace['x']  # or trace.x or trace[x]


The call will return the sampling values of x, with the values for all chains concatenated. (For a single call to sample, the number of chains will correspond to the cores argument.)

To discard the first N values of each chain, slicing syntax can be used.

>>> trace['x', 1000:]


The get_values method offers more control over which values are returned. The call below will discard the first 1000 iterations from each chain and keep the values for each chain as separate arrays.

>>> trace.get_values('x', burn=1000, combine=False)


The chains parameter of get_values can be used to limit the chains that are retrieved.

>>> trace.get_values('x', burn=1000, chains=[0, 2])


MultiTrace objects also support slicing. For example, the following call would return a new trace object without the first 1000 sampling iterations for all traces and variables.

>>> sliced_trace = trace[1000:]


The backend for the new trace is always NDArray, regardless of the type of original trace. Only the NDArray backend supports a stop value in the slice.

Saved backends can be loaded using load function in the module for the specific backend.

>>> trace = pm.backends.text.load('test')


## Writing custom backends¶

Backends consist of a class that handles sampling storage and value selection. Three sampling methods of backend will be called:

• setup: Before sampling is started, the setup method will be called with two arguments: the number of draws and the chain number. This is useful setting up any structure for storing the sampling values that require the above information.
• record: Record the sampling results for the current draw. This method will be called with a dictionary of values mapped to the variable names. This is the only sampling function that must do something to have a meaningful backend.
• close: This method is called following sampling and should perform any actions necessary for finalizing and cleaning up the backend.

The base storage class backends.base.BaseTrace provides common model setup that is used by all the PyMC backends.

Several selection methods must also be defined:

• get_values: This is the core method for selecting values from the backend. It can be called directly and is used by __getitem__ when the backend is indexed with a variable name or object.
• _slice: Defines how the backend returns a slice of itself. This is called if the backend is indexed with a slice range.
• point: Returns values for each variable at a single iteration. This is called if the backend is indexed with a single integer.
• __len__: This should return the number of draws.

When pymc3.sample finishes, it wraps all trace objects in a MultiTrace object that provides a consistent selection interface for all backends. If the traces are stored on disk, then a load function should also be defined that returns a MultiTrace object.

For specific examples, see pymc3.backends.{ndarray,text,sqlite}.py.

## ndarray¶

NumPy array trace backend

Store sampling values in memory as a NumPy array.

class pymc3.backends.ndarray.NDArray(name=None, model=None, vars=None, test_point=None)

NDArray trace object

Parameters: name : str Name of backend. This has no meaning for the NDArray backend. model : Model If None, the model is taken from the with context. vars : list of variables Sampling values will be stored for these variables. If None, model.unobserved_RVs is used.
close()

Close the database backend.

This is called after sampling has finished.

get_values(varname, burn=0, thin=1)

Get values from trace.

Parameters: varname : str burn : int thin : int A NumPy array
point(idx)

Return dictionary of point values at idx for current chain with variable names as keys.

record(point, sampler_stats=None)

Record results of a sampling iteration.

Parameters: point : dict Values mapped to variable names
setup(draws, chain, sampler_vars=None)

Perform chain-specific setup.

Parameters: draws : int Expected number of draws chain : int Chain number sampler_vars : list of dicts Names and dtypes of the variables that are exported by the samplers.
pymc3.backends.ndarray.load_trace(directory, model=None)

Loads a multitrace that has been written to file.

A the model used for the trace must be passed in, or the command must be run in a model context.

Parameters: directory : str Path to a pymc3 serialized trace model : pm.Model (optional) Model used to create the trace. Can also be inferred from context pm.Multitrace that was saved in the directory
pymc3.backends.ndarray.save_trace(trace, directory=None, overwrite=False)

Save multitrace to file.

TODO: Also save warnings.

This is a custom data format for PyMC3 traces. Each chain goes inside a directory, and each directory contains a metadata json file, and a numpy compressed file. See https://docs.scipy.org/doc/numpy/neps/npy-format.html for more information about this format.

Parameters: trace : pm.MultiTrace trace to save to disk directory : str (optional) path to a directory to save the trace overwrite : bool (default False) whether to overwrite an existing directory. str, path to the directory where the trace was saved

## sqlite¶

SQLite trace backend

Store and retrieve sampling values in SQLite database file.

### Database format¶

For each variable, a table is created with the following format:

recid (INT), draw (INT), chain (INT), v0 (FLOAT), v1 (FLOAT), v2 (FLOAT) …

The variable column names are extended to reflect additional dimensions. For example, a variable with the shape (2, 2) would be stored as

key (INT), draw (INT), chain (INT), v0_0 (FLOAT), v0_1 (FLOAT), v1_0 (FLOAT) …

The key is autoincremented each time a new row is added to the table. The chain column denotes the chain index and starts at 0.

class pymc3.backends.sqlite.SQLite(name, model=None, vars=None, test_point=None)

SQLite trace object

Parameters: name : str Name of database file model : Model If None, the model is taken from the with context. vars : list of variables Sampling values will be stored for these variables. If None, model.unobserved_RVs is used. test_point : dict use different test point that might be with changed variables shapes
close()

Close the database backend.

This is called after sampling has finished.

get_values(varname, burn=0, thin=1)

Get values from trace.

Parameters: varname : str burn : int thin : int A NumPy array
point(idx)

Return dictionary of point values at idx for current chain with variables names as keys.

record(point)

Record results of a sampling iteration.

Parameters: point : dict Values mapped to variable names
setup(draws, chain)

Perform chain-specific setup.

Parameters: draws : int Expected number of draws chain : int Chain number
pymc3.backends.sqlite.load(name, model=None)

Parameters: name : str Path to SQLite database file model : Model If None, the model is taken from the with context. A MultiTrace instance

## text¶

Text file trace backend

Store sampling values as CSV files.

### File format¶

Sampling values for each chain are saved in a separate file (under a directory specified by the name argument). The rows correspond to sampling iterations. The column names consist of variable names and index labels. For example, the heading

x,y__0_0,y__0_1,y__1_0,y__1_1,y__2_0,y__2_1

represents two variables, x and y, where x is a scalar and y has a shape of (3, 2).

class pymc3.backends.text.Text(name, model=None, vars=None, test_point=None)

Text trace object

Parameters: name : str Name of directory to store text files model : Model If None, the model is taken from the with context. vars : list of variables Sampling values will be stored for these variables. If None, model.unobserved_RVs is used. test_point : dict use different test point that might be with changed variables shapes
close()

Close the database backend.

This is called after sampling has finished.

get_values(varname, burn=0, thin=1)

Get values from trace.

Parameters: varname : str burn : int thin : int A NumPy array
point(idx)

Return dictionary of point values at idx for current chain with variables names as keys.

record(point)

Record results of a sampling iteration.

Parameters: point : dict Values mapped to variable names
setup(draws, chain)

Perform chain-specific setup.

Parameters: draws : int Expected number of draws chain : int Chain number
pymc3.backends.text.dump(name, trace, chains=None)

Store values from NDArray trace as CSV files.

Parameters: name : str Name of directory to store CSV files in trace : MultiTrace of NDArray traces Result of MCMC run with default NDArray backend chains : list Chains to dump. If None, all chains are dumped.
pymc3.backends.text.load(name, model=None)

pymc3.backends.tracetab.trace_to_dataframe(trace, chains=None, varnames=None, include_transformed=False)