Dot-product attention layer, a.k.a. Luong-style attention.

layer_attention(
inputs,
use_scale = FALSE,
causal = FALSE,
batch_size = NULL,
dtype = NULL,
name = NULL,
trainable = NULL,
weights = NULL
)

## Arguments

inputs a list of inputs first should be the query tensor, the second the value tensor If True, will create a scalar variable to scale the attention scores. Boolean. Set to True for decoder self-attention. Adds a mask such that position i cannot attend to positions j > i. This prevents the flow of information from the future towards the past. Fixed batch size for layer The data type expected by the input, as a string (float32, float64, int32...) An optional name string for the layer. Should be unique in a model (do not reuse the same name twice). It will be autogenerated if it isn't provided. Whether the layer weights will be updated during training. Initial weights for layer.

