mwptoolkit.module.Layer.transformer_layer

class mwptoolkit.module.Layer.transformer_layer.EPTTransformerLayer(hidden_dim=None, num_decoder_heads=None, layernorm_eps=None, intermediate_dim=None)[source]

Bases: Module

Class for Transformer Encoder/Decoder layer (follows the paper, ‘Attention is all you need’)

Initialize TransformerLayer class

Parameters

config (ModelConfig) – Configuration of this Encoder/Decoder layer

forward(target, target_ignorance_mask=None, target_attention_mask=None, memory=None, memory_ignorance_mask=None)[source]

Forward-computation of Transformer Encoder/Decoder layers

Parameters
  • target (torch.Tensor) – FloatTensor indicating Sequence of target vectors. Shape [batch_size, target_length, hidden_size].

  • target_ignorance_mask (torch.Tensor) – BoolTensor indicating Mask for target tokens that should be ignored. Shape [batch_size, target_length].

  • target_attention_mask (torch.Tensor) – BoolTensor indicating Target-to-target Attention mask for target tokens. Shape [target_length, target_length].

  • memory (torch.Tensor) – FloatTensor indicating Sequence of source vectors. Shape [batch_size, sequence_length, hidden_size]. This can be None when you want to use this layer as an encoder layer.

  • memory_ignorance_mask (torch.Tensor) – BoolTensor indicating Mask for source tokens that should be ignored. Shape [batch_size, sequence_length].

Returns

Decoder hidden states per each target token, shape [batch_size, sequence_length, hidden_size].

Return type

torch.FloatTensor

training: bool
class mwptoolkit.module.Layer.transformer_layer.GAEncoderLayer(size, self_attn, feed_forward, dropout)[source]

Bases: Module

Group attentional encoder layer, encoder is made up of self-attn and feed forward.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x, mask)[source]

Follow Figure 1 (left) for connections.

training: bool
class mwptoolkit.module.Layer.transformer_layer.LayerNorm(features, eps=1e-06)[source]

Bases: Module

Construct a layernorm module (See citation for details).

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class mwptoolkit.module.Layer.transformer_layer.PositionwiseFeedForward(d_model, d_ff, dropout=0.1)[source]

Bases: Module

Implements FFN equation.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class mwptoolkit.module.Layer.transformer_layer.SublayerConnection(size, dropout)[source]

Bases: Module

A residual connection followed by a layer norm. Note for code simplicity the norm is first as opposed to last.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x, sublayer)[source]

Apply residual connection to any sublayer with the same size.

training: bool
class mwptoolkit.module.Layer.transformer_layer.TransformerLayer(embedding_size, ffn_size, num_heads, attn_dropout_ratio=0.0, attn_weight_dropout_ratio=0.0, ffn_dropout_ratio=0.0, with_external=False)[source]

Bases: Module

Transformer Layer, including

a multi-head self-attention, a external multi-head self-attention layer (only for conditional decoder) and a point-wise feed-forward layer.

Parameters
  • self_padding_mask (torch.bool) – the padding mask for the multi head attention sublayer.

  • self_attn_mask (torch.bool) – the attention mask for the multi head attention sublayer.

  • external_states (torch.Tensor) – the external context for decoder, e.g., hidden states from encoder.

  • external_padding_mask (torch.bool) – the padding mask for the external states.

Returns

the output of the point-wise feed-forward sublayer, is the output of the transformer layer

Return type

feedforward_output (torch.Tensor)

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x, kv=None, self_padding_mask=None, self_attn_mask=None, external_states=None, external_padding_mask=None)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

gelu(x)[source]
reset_parameters()[source]
training: bool