R/layerattention.R
layer_additive_attention.Rd
Additive attention layer, a.k.a. Bahdanaustyle attention
layer_additive_attention( object, use_scale = TRUE, ..., causal = FALSE, dropout = 0 )
object  What to compose the new


use_scale  If 
...  standard layer arguments. 
causal  Boolean. Set to 
dropout  Float between 0 and 1. Fraction of the units to drop for the attention scores. 
Inputs are query
tensor of shape [batch_size, Tq, dim]
, value
tensor of
shape [batch_size, Tv, dim]
and key
tensor of shape
[batch_size, Tv, dim]
. The calculation follows the steps:
Reshape query
and key
into shapes [batch_size, Tq, 1, dim]
and [batch_size, 1, Tv, dim]
respectively.
Calculate scores with shape [batch_size, Tq, Tv]
as a nonlinear
sum: scores = tf.reduce_sum(tf.tanh(query + key), axis=1)
Use scores to calculate a distribution with shape
[batch_size, Tq, Tv]
: distribution = tf$nn$softmax(scores)
.
Use distribution
to create a linear combination of value
with
shape [batch_size, Tq, dim]
:
return tf$matmul(distribution, value)
.