Dot-product attention layer, a.k.a. Luong-style attention.

layer_attention(
  inputs,
  use_scale = FALSE,
  causal = FALSE,
  batch_size = NULL,
  dtype = NULL,
  name = NULL,
  trainable = NULL,
  weights = NULL
)

Arguments

inputs

a list of inputs first should be the query tensor, the second the value tensor

use_scale

If True, will create a scalar variable to scale the attention scores.

causal

Boolean. Set to True for decoder self-attention. Adds a mask such that position i cannot attend to positions j > i. This prevents the flow of information from the future towards the past.

batch_size

Fixed batch size for layer

dtype

The data type expected by the input, as a string (float32, float64, int32...)

name

An optional name string for the layer. Should be unique in a model (do not reuse the same name twice). It will be autogenerated if it isn't provided.

trainable

Whether the layer weights will be updated during training.

weights

Initial weights for layer.

See also