pymc.momentum#

pymc.momentum(loss_or_grads=None, params=None, learning_rate=0.001, momentum=0.9)[source]#

Stochastic Gradient Descent (SGD) updates with momentum

Generates update expressions of the form:

velocity := momentum * velocity - learning_rate * gradient
param := param + velocity

Parameters:

loss_or_grads: symbolic expression or list of expressions: A scalar loss expression, or a list of gradient expressions
params: list of shared variables: The variables to generate update expressions for
learning_rate: float or symbolic scalar: The learning rate controlling the size of update steps
momentum: float or symbolic scalar, optional: The amount of momentum to apply. Higher momentum results in smoothing over more update steps. Defaults to 0.9.

Returns:

OrderedDict: A dictionary mapping each parameter to its update expression

See also

apply_momentum: Generic function applying momentum to updates
nesterov_momentum: Nesterov’s variant of SGD with momentum

Notes

Higher momentum also results in larger update steps. To counter that, you can optionally scale your learning rate by 1 - momentum.

Optimizer can be called without both loss_or_grads and params in that case partial function is returned

Examples

>>> a = pytensor.shared(1.)
>>> b = a*2
>>> updates = momentum(b, [a], learning_rate=.01)
>>> isinstance(updates, dict)
True
>>> optimizer = momentum(learning_rate=.01)
>>> callable(optimizer)
True
>>> updates = optimizer(b, [a])
>>> isinstance(updates, dict)
True