pymc.momentum#

pymc.momentum(loss_or_grads=None, params=None, learning_rate=0.001, momentum=0.9)[source]#

Stochastic Gradient Descent (SGD) updates with momentum

Generates update expressions of the form:

  • velocity := momentum * velocity - learning_rate * gradient

  • param := param + velocity

Parameters:
loss_or_grads: symbolic expression or list of expressions

A scalar loss expression, or a list of gradient expressions

params: list of shared variables

The variables to generate update expressions for

learning_rate: float or symbolic scalar

The learning rate controlling the size of update steps

momentum: float or symbolic scalar, optional

The amount of momentum to apply. Higher momentum results in smoothing over more update steps. Defaults to 0.9.

Returns:
OrderedDict

A dictionary mapping each parameter to its update expression

See also

apply_momentum

Generic function applying momentum to updates

nesterov_momentum

Nesterov’s variant of SGD with momentum

Notes

Higher momentum also results in larger update steps. To counter that, you can optionally scale your learning rate by 1 - momentum.

Optimizer can be called without both loss_or_grads and params in that case partial function is returned

Examples

>>> a = pytensor.shared(1.)
>>> b = a*2
>>> updates = momentum(b, [a], learning_rate=.01)
>>> isinstance(updates, dict)
True
>>> optimizer = momentum(learning_rate=.01)
>>> callable(optimizer)
True
>>> updates = optimizer(b, [a])
>>> isinstance(updates, dict)
True