Backends

Storage backends for traces

The NDArray (pymc3.backends.NDArray) backend holds the entire trace in memory.

Selecting values from a backend

After a backend is finished sampling, it returns a MultiTrace object. Values can be accessed in a few ways. The easiest way is to index the backend object with a variable or variable name.

>>> trace['x']  # or trace.x or trace[x]

The call will return the sampling values of x, with the values for all chains concatenated. (For a single call to sample, the number of chains will correspond to the cores argument.)

To discard the first N values of each chain, slicing syntax can be used.

>>> trace['x', 1000:]

The get_values method offers more control over which values are returned. The call below will discard the first 1000 iterations from each chain and keep the values for each chain as separate arrays.

>>> trace.get_values('x', burn=1000, combine=False)

The chains parameter of get_values can be used to limit the chains that are retrieved.

>>> trace.get_values('x', burn=1000, chains=[0, 2])

MultiTrace objects also support slicing. For example, the following call would return a new trace object without the first 1000 sampling iterations for all traces and variables.

>>> sliced_trace = trace[1000:]

The backend for the new trace is always NDArray, regardless of the type of original trace.

Loading a saved backend

Saved backends can be loaded using arviz.from_netcdf

ndarray

NumPy array trace backend

Store sampling values in memory as a NumPy array.

class pymc3.backends.ndarray.NDArray(name=None, model=None, vars=None, test_point=None)

NDArray trace object

Parameters
name: str

Name of backend. This has no meaning for the NDArray backend.

model: Model

If None, the model is taken from the with context.

vars: list of variables

Sampling values will be stored for these variables. If None, model.unobserved_RVs is used.

close()

Close the database backend.

This is called after sampling has finished.

get_values(varname: str, burn=0, thin=1) numpy.ndarray

Get values from trace.

Parameters
varname: str
burn: int
thin: int
Returns
A NumPy array
point(idx) Dict[str, Any]

Return dictionary of point values at idx for current chain with variable names as keys.

record(point, sampler_stats=None) None

Record results of a sampling iteration.

Parameters
point: dict

Values mapped to variable names

setup(draws, chain, sampler_vars=None) None

Perform chain-specific setup.

Parameters
draws: int

Expected number of draws

chain: int

Chain number

sampler_vars: list of dicts

Names and dtypes of the variables that are exported by the samplers.

pymc3.backends.ndarray.load_trace(directory: str, model=None) pymc3.backends.base.MultiTrace

Loads a multitrace that has been written to file.

A the model used for the trace must be passed in, or the command must be run in a model context.

Parameters
directory: str

Path to a pymc3 serialized trace

model: pm.Model (optional)

Model used to create the trace. Can also be inferred from context

Returns
pm.Multitrace that was saved in the directory
pymc3.backends.ndarray.point_list_to_multitrace(point_list: List[Dict[str, numpy.ndarray]], model: Optional[pymc3.model.Model] = None) pymc3.backends.base.MultiTrace

transform point list into MultiTrace

pymc3.backends.ndarray.save_trace(trace: pymc3.backends.base.MultiTrace, directory: Optional[str] = None, overwrite=False) str

Save multitrace to file.

TODO: Also save warnings.

This is a custom data format for PyMC3 traces. Each chain goes inside a directory, and each directory contains a metadata json file, and a numpy compressed file. See https://docs.scipy.org/doc/numpy/neps/npy-format.html for more information about this format.

Parameters
trace: pm.MultiTrace

trace to save to disk

directory: str (optional)

path to a directory to save the trace

overwrite: bool (default False)

whether to overwrite an existing directory.

Returns
str, path to the directory where the trace was saved

tracetab

Functions for converting traces into a table-like format

pymc3.backends.tracetab.trace_to_dataframe(trace, chains=None, varnames=None, include_transformed=False)

Convert trace to pandas DataFrame.

Parameters
trace: NDarray trace
chains: int or list of ints

Chains to include. If None, all chains are used. A single chain value can also be given.

varnames: list of variable names

Variables to be included in the DataFrame, if None all variable are included.

include_transformed: boolean

If true transformed variables will be included in the resulting DataFrame.