mwptoolkit.module.Layer.transformer_layer¶
- class mwptoolkit.module.Layer.transformer_layer.EPTTransformerLayer(hidden_dim=None, num_decoder_heads=None, layernorm_eps=None, intermediate_dim=None)[source]¶
Bases:
Module
Class for Transformer Encoder/Decoder layer (follows the paper, ‘Attention is all you need’)
Initialize TransformerLayer class
- Parameters
config (ModelConfig) – Configuration of this Encoder/Decoder layer
- forward(target, target_ignorance_mask=None, target_attention_mask=None, memory=None, memory_ignorance_mask=None)[source]¶
Forward-computation of Transformer Encoder/Decoder layers
- Parameters
target (torch.Tensor) – FloatTensor indicating Sequence of target vectors. Shape [batch_size, target_length, hidden_size].
target_ignorance_mask (torch.Tensor) – BoolTensor indicating Mask for target tokens that should be ignored. Shape [batch_size, target_length].
target_attention_mask (torch.Tensor) – BoolTensor indicating Target-to-target Attention mask for target tokens. Shape [target_length, target_length].
memory (torch.Tensor) – FloatTensor indicating Sequence of source vectors. Shape [batch_size, sequence_length, hidden_size]. This can be None when you want to use this layer as an encoder layer.
memory_ignorance_mask (torch.Tensor) – BoolTensor indicating Mask for source tokens that should be ignored. Shape [batch_size, sequence_length].
- Returns
Decoder hidden states per each target token, shape [batch_size, sequence_length, hidden_size].
- Return type
torch.FloatTensor
- training: bool¶
- class mwptoolkit.module.Layer.transformer_layer.GAEncoderLayer(size, self_attn, feed_forward, dropout)[source]¶
Bases:
Module
Group attentional encoder layer, encoder is made up of self-attn and feed forward.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- training: bool¶
- class mwptoolkit.module.Layer.transformer_layer.LayerNorm(features, eps=1e-06)[source]¶
Bases:
Module
Construct a layernorm module (See citation for details).
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class mwptoolkit.module.Layer.transformer_layer.PositionwiseFeedForward(d_model, d_ff, dropout=0.1)[source]¶
Bases:
Module
Implements FFN equation.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶
- class mwptoolkit.module.Layer.transformer_layer.SublayerConnection(size, dropout)[source]¶
Bases:
Module
A residual connection followed by a layer norm. Note for code simplicity the norm is first as opposed to last.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- training: bool¶
- class mwptoolkit.module.Layer.transformer_layer.TransformerLayer(embedding_size, ffn_size, num_heads, attn_dropout_ratio=0.0, attn_weight_dropout_ratio=0.0, ffn_dropout_ratio=0.0, with_external=False)[source]¶
Bases:
Module
- Transformer Layer, including
a multi-head self-attention, a external multi-head self-attention layer (only for conditional decoder) and a point-wise feed-forward layer.
- Parameters
self_padding_mask (torch.bool) – the padding mask for the multi head attention sublayer.
self_attn_mask (torch.bool) – the attention mask for the multi head attention sublayer.
external_states (torch.Tensor) – the external context for decoder, e.g., hidden states from encoder.
external_padding_mask (torch.bool) – the padding mask for the external states.
- Returns
the output of the point-wise feed-forward sublayer, is the output of the transformer layer
- Return type
feedforward_output (torch.Tensor)
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x, kv=None, self_padding_mask=None, self_attn_mask=None, external_states=None, external_padding_mask=None)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool¶