Normalize the activations of the previous layer for each given example in a batch independently, rather than across a batch like Batch Normalization. i.e. applies a transformation that maintains the mean activation within each example close to 0 and the activation standard deviation close to 1.
layer_layer_normalization( object, axis = -1, epsilon = 0.001, center = TRUE, scale = TRUE, beta_initializer = "zeros", gamma_initializer = "ones", beta_regularizer = NULL, gamma_regularizer = NULL, beta_constraint = NULL, gamma_constraint = NULL, trainable = TRUE, name = NULL )
What to compose the new
Layer instance with. Typically a
Sequential model or a Tensor (e.g., as returned by
The return value depends on
Layer instance is returned.
Sequential model, the model with an additional layer is returned.
a Tensor, the output tensor from
layer_instance(object) is returned.
Integer or List/Tuple. The axis or axes to normalize across. Typically this is the features axis/axes. The left-out axes are typically the batch axis/axes. This argument defaults to -1, the last dimension in the input.
Small float added to variance to avoid dividing by zero. Defaults to 1e-3
If True, add offset of beta to normalized tensor. If False, beta is ignored. Defaults to True.
If True, multiply by gamma. If False, gamma is not used. Defaults to True. When the next layer is linear (also e.g. nn.relu), this can be disabled since the scaling will be done by the next layer.
Initializer for the beta weight. Defaults to zeros.
Initializer for the gamma weight. Defaults to ones.
Optional regularizer for the beta weight. None by default.
Optional regularizer for the gamma weight. None by default.
Optional constraint for the beta weight. None by default.
Optional constraint for the gamma weight. None by default.
Boolean, if True the variables will be marked as trainable. Defaults to True.
An optional name string for the layer. Should be unique in a model (do not reuse the same name twice). It will be autogenerated if it isn't provided.
Given a tensor inputs, moments are calculated and normalization is performed across the axes specified in axis.