Adamax optimizer from Section 7 of the Adam paper. It is a variant of Adam based on the infinity norm.

optimizer_adamax(
lr = 0.002,
beta_1 = 0.9,
beta_2 = 0.999,
epsilon = NULL,
decay = 0,
clipnorm = NULL,
clipvalue = NULL
)

## Arguments

lr float >= 0. Learning rate. The exponential decay rate for the 1st moment estimates. float, 0 < beta < 1. Generally close to 1. The exponential decay rate for the 2nd moment estimates. float, 0 < beta < 1. Generally close to 1. float >= 0. Fuzz factor. If NULL, defaults to k_epsilon(). float >= 0. Learning rate decay over each update. Gradients will be clipped when their L2 norm exceeds this value. Gradients will be clipped when their absolute value exceeds this value.

