Stochastic gradient descent optimizer with support for momentum, learning rate decay, and Nesterov momentum.
optimizer_sgd( learning_rate = 0.01, momentum = 0, decay = 0, nesterov = FALSE, clipnorm = NULL, clipvalue = NULL, ... )
float >= 0. Learning rate.
float >= 0. Parameter that accelerates SGD in the relevant direction and dampens oscillations.
float >= 0. Learning rate decay over each update.
boolean. Whether to apply Nesterov momentum.
Gradients will be clipped when their L2 norm exceeds this value.
Gradients will be clipped when their absolute value exceeds this value.
Unused, present only for backwards compatability
Optimizer for use with