mwptoolkit.module.Decoder.ept_decoder¶
- class mwptoolkit.module.Decoder.ept_decoder.AveragePooling(dim: int = -1, keepdim: bool = False)[source]¶
Bases:
Module
Layer class for computing mean of a sequence
Layer class for computing mean of a sequence
- Parameters
dim (int) – Dimension to be averaged. -1 by default.
keepdim (bool) – True if you want to keep averaged dimensions. False by default.
- extra_repr()[source]¶
Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(tensor: Tensor)[source]¶
Do average pooling over a sequence
- Parameters
tensor (torch.Tensor) – FloatTensor to be averaged.
- Returns
Averaged result.
- Return type
torch.FloatTensor
- training: bool¶
- class mwptoolkit.module.Decoder.ept_decoder.DecoderModel(config)[source]¶
Bases:
Module
Base model for equation generation/classification (Abstract class)
Initiate Equation Builder instance
- Parameters
config (ModelConfig) – Configuration of this model
- _build_target_dict(**kwargs) Dict[str, Tensor] [source]¶
Build dictionary of target matrices.
- Return type
Dict[str, torch.Tensor]
- Returns
Dictionary of target values
- _forward_single(**kwargs) Dict[str, Tensor] [source]¶
Forward computation of a single beam
- Return type
Dict[str, torch.Tensor]
- Returns
Dictionary of computed values
- _init_weights(module: Module)[source]¶
Initialize weights
- Parameters
module (nn.Module) – Module to be initialized.
- forward(text: Optional[Tensor] = None, text_pad: Optional[Tensor] = None, text_num: Optional[Tensor] = None, text_numpad: Optional[Tensor] = None, equation: Optional[Tensor] = None, beam: int = 1, max_len: int = 128, function_arities: Optional[Dict[int, int]] = None)[source]¶
Forward computation of decoder model
- Returns
- Dictionary of tensors.
If this model is currently on training phase, values will be accuracy or loss tensors Otherwise, values will be tensors representing predicted distribution of output
- Return type
Dict[str, torch.Tensor]
- init_factor()[source]¶
- Returns
Standard deviation of normal distribution that will be used for initializing weights.
- Return type
float
- property is_expression_type: bool¶
bool :return: True if this model requires Expression type sequence
- Type
rtype
- property required_field: str¶
str :return: Name of required field type to process
- Type
rtype
- training: bool¶
- class mwptoolkit.module.Decoder.ept_decoder.ExpressionDecoderModel(config, out_opsym2idx, out_idx2opsym, out_consym2idx, out_idx2consym)[source]¶
Bases:
DecoderModel
Decoding model that generates expression sequences (Abstract class)
Initiate Equation Builder instance
- Parameters
config (ModelConfig) – Configuration of this model
- _build_decoder_context(embedding: Tensor, embedding_pad: Optional[Tensor] = None, text: Optional[Tensor] = None, text_pad: Optional[Tensor] = None)[source]¶
Compute decoder’s hidden state vectors
- Parameters
embedding (torch.Tensor) – FloatTensor containing input vectors. Shape [batch_size, equation_length, hidden_size],
embedding_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the decoding sequence, Shape [batch_size, equation_length]
text (torch.Tensor) – FloatTensor containing encoder’s hidden states. Shape [batch_size, input_sequence_length, hidden_size].
text_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the input sequence. Shape [batch_size, input_sequence_length]
- Returns
A FloatTensor of shape [batch_size, equation_length, hidden_size], which contains decoder’s hidden states.
- Return type
torch.Tensor
- _build_decoder_input(ids: Tensor, nums: Tensor)[source]¶
Compute input of the decoder
- Parameters
ids (torch.Tensor) – LongTensor containing index-type information of an operator and its operands. Shape: [batch_size, equation_length, 1+2*arity_size]
nums (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape: [batch_size, num_size, hidden_size].
- Returns
A FloatTensor representing input vector. Shape [batch_size, equation_length, hidden_size].
- Return type
torch.Tensor
- _build_operand_embed(ids: Tensor, mem_pos: Tensor, nums: Tensor) Tensor [source]¶
Build operand embedding a_ij in the paper.
- Parameters
ids (torch.Tensor) – LongTensor containing index-type information of operands. (This corresponds to a_ij in the paper)
mem_pos (torch.Tensor) – FloatTensor containing positional encoding used so far. (i.e. PE(.) in the paper)
nums (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. (i.e. e_{a_ij} in the paper)
- Return type
torch.Tensor
- Returns
A FloatTensor representing operand embedding vector a_ij in Equation 3, 4, 5
- _forward_single(text: Optional[Tensor] = None, text_pad: Optional[Tensor] = None, text_num: Optional[Tensor] = None, text_numpad: Optional[Tensor] = None, equation: Optional[Tensor] = None)[source]¶
Forward computation of a single beam
- Parameters
text (torch.Tensor) – FloatTensor containing encoder’s hidden states. Shape [batch_size, input_sequence_length, hidden_size].
text_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the input sequence. Shape [batch_size, input_sequence_length]
text_num (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape: [batch_size, num_size, hidden_size].
equation (torch.Tensor) – LongTensor containing index-type information of an operator and its operands. Shape: [batch_size, equation_length, 1+2*arity_size].
- Returns
- Dictionary of followings
’operator’: Log probability of next operators. FloatTensor with shape [batch_size, equation_length, operator_size]. ‘_out’: Decoder’s hidden states. FloatTensor with shape [batch_size, equation_length, hidden_size]. ‘_not_usable’: Indicating positions that corresponding output values are not usable in the operands. BoolTensor with Shape [batch_size, equation_length].
- Return type
Dict[str, torch.Tensor]
Transformer layer
- function_arities¶
Embedding layers
- operand_norm¶
Linear Transformation
- operand_source_embedding¶
Scalar parameters
- operand_source_factor¶
Layer Normalizations
Output layer
- training: bool¶
- class mwptoolkit.module.Decoder.ept_decoder.ExpressionPointerTransformer(config, out_opsym2idx, out_idx2opsym, out_consym2idx, out_idx2consym)[source]¶
Bases:
ExpressionDecoderModel
The EPT model
Initiate Equation Builder instance
- Parameters
config (ModelConfig) – Configuration of this model
- _build_attention_keys(num: Tensor, mem: Tensor, num_pad: Optional[Tensor] = None, mem_pad: Optional[Tensor] = None)[source]¶
Generate Attention Keys by concatenating all items.
- Parameters
num (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape [batch_size, num_size, hidden_size].
mem (torch.Tensor) – FloatTensor containing decoder’s hidden states corresponding to prior expression outputs. Shape [batch_size, equation_length, hidden_size].
num_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the number sequence. Shape [batch_size, num_size]
mem_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the target expression sequence. Shape [batch_size, equation_length]
- Returns
- Triple of Tensors
[0] Keys (A_ij in the paper). Shape [batch_size, constant_size+num_size+equation_length, hidden_size], where C = size of constant vocabulary.
[1] Mask for positions that should be ignored in keys. Shape [batch_size, C+num_size+equation_length]
[2] Forward Attention Mask to ignore future tokens in the expression sequence. Shape [equation_length, C+num_size+equation_length]
- Return type
Tuple[torch.Tensor, torch.Tensor, torch.Tensor]
- _build_operand_embed(ids: Tensor, mem_pos: Tensor, nums: Tensor)[source]¶
Build operand embedding.
- Parameters
ids (torch.Tensor) – LongTensor containing source-content information of operands. Shape [batch_size, equation_length, 1+2*arity_size].
mem_pos (torch.Tensor) – FloatTensor containing positional encoding used so far. Shape [batch_size, equation_length, hidden_size].
nums (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape [batch_size, num_size, hidden_size].
- Returns
A FloatTensor representing operand embedding vector. Shape [batch_size, equation_length, arity_size, hidden_size]
- Return type
torch.Tensor
- _build_target_dict(equation, num_pad=None)[source]¶
Build dictionary of target matrices.
- Returns
- Dictionary of target values
’operator’: Index of next operators. LongTensor with shape [batch_size, equation_length]. ‘operand_J’: Index of next J-th operands. LongTensor with shape [batch_size, equation_length].
- Return type
Dict[str, torch.Tensor]
- _forward_single(text: Optional[Tensor] = None, text_pad: Optional[Tensor] = None, text_num: Optional[Tensor] = None, text_numpad: Optional[Tensor] = None, equation: Optional[Tensor] = None)[source]¶
Forward computation of a single beam
- Parameters
text (torch.Tensor) – FloatTensor containing encoder’s hidden states. Shape [batch_size, input_sequence_length, hidden_size].
text_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the input sequence. Shape [batch_size, input_sequence_length]
text_num (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape: [batch_size, num_size, hidden_size].
text_numpad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the number sequence. Shape [batch_size, num_size]
equation (torch.Tensor) – LongTensor containing index-type information of an operator and its operands. Shape: [batch_size, equation_length, 1+2*arity_size].
- Returns
- Dictionary of followings
’operator’: Log probability of next operators.FloatTensor with shape [batch_size, equation_length, operator_size]. ‘operand_J’: Log probability of next J-th operands. FloatTensor with shape [batch_size, equation_length, operand_size].
- Return type
Dict[str, torch.Tensor]
- constant_word_embedding¶
Output layer
- operand_out¶
Initialize weights
- property required_field: str¶
str :return: Name of required field type to process
- Type
rtype
- training: bool¶
- class mwptoolkit.module.Decoder.ept_decoder.ExpressionTransformer(config, out_opsym2idx, out_idx2opsym, out_consym2idx, out_idx2consym)[source]¶
Bases:
ExpressionDecoderModel
Vanilla Transformer + Expression (The second ablated model)
Initiate Equation Builder instance
- Parameters
config (ModelConfig) – Configuration of this model
- _build_operand_embed(ids: Tensor, mem_pos: Tensor, nums: Tensor)[source]¶
Build operand embedding.
- Parameters
ids (torch.Tensor) – LongTensor containing source-content information of operands. Shape [batch_size, equation_length, 1+2*arity_size].
mem_pos (torch.Tensor) – FloatTensor containing positional encoding used so far. Shape [batch_size, equation_length, hidden_size], where hidden_size = dimension of hidden state
nums (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape [batch_size, num_size, hidden_size].
- Returns
A FloatTensor representing operand embedding vector. Shape [batch_size, equation_length, arity_size, hidden_size]
- Return type
torch.Tensor
- _build_target_dict(equation, num_pad=None)[source]¶
Build dictionary of target matrices.
- Returns
- Dictionary of target values
’operator’: Index of next operators. LongTensor with shape [batch_size, equation_length]. ‘operand_J’: Index of next J-th operands. LongTensor with shape [batch_size, equation_length].
- Return type
Dict[str, torch.Tensor]
- _forward_single(text: Optional[Tensor] = None, text_pad: Optional[Tensor] = None, text_num: Optional[Tensor] = None, text_numpad: Optional[Tensor] = None, equation: Optional[Tensor] = None)[source]¶
Forward computation of a single beam
- Parameters
text (torch.Tensor) – FloatTensor containing encoder’s hidden states. Shape [batch_size, input_sequence_length, hidden_size].
text_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the input sequence. Shape [batch_size, input_sequence_length]
text_num (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text.Shape: [batch_size, num_size, hidden_size].
text_numpad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the number sequence. Shape [batch_size, num_size]
equation (torch.Tensor) – LongTensor containing index-type information of an operator and its operands. Shape: [batch_size, equation_length, 1+2*arity_size].
- Returns
- Dictionary of followings
’operator’: Log probability of next operators. FloatTensor with shape [batch_size, equation_length, operator_size], where operator_size = size of operator vocabulary. ‘operand_J’: Log probability of next J-th operands.FloatTensor with shape [batch_size, equation_length, operand_size].
- Return type
Dict[str, torch.Tensor]
- operand_out¶
Initialize weights
- operand_word_embedding¶
Output layer
- property required_field: str¶
str :return: Name of required field type to process
- Type
rtype
- training: bool¶
- class mwptoolkit.module.Decoder.ept_decoder.LogSoftmax(dim: Optional[int] = None)[source]¶
Bases:
LogSoftmax
LogSoftmax layer that can handle infinity values.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- dim: Optional[int]¶
- class mwptoolkit.module.Decoder.ept_decoder.OpDecoderModel(config)[source]¶
Bases:
DecoderModel
Decoding model that generates Op(Operator/Operand) sequences (Abstract class)
Initiate Equation Builder instance
- Parameters
config (ModelConfig) – Configuration of this model
- _build_decoder_context(embedding: Tensor, embedding_pad: Optional[Tensor] = None, text: Optional[Tensor] = None, text_pad: Optional[Tensor] = None)[source]¶
Compute decoder’s hidden state vectors.
- Parameters
embedding (torch.Tensor) – FloatTensor containing input vectors. Shape [batch_size, decoding_sequence, input_embedding_size].
embedding_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the decoding sequence. Shape [batch_size, decoding_sequence]
text (torch.Tensor) – FloatTensor containing encoder’s hidden states. Shape [batch_size, input_sequence_length, input_embedding_size].
text_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the input sequence. Shape [batch_size, input_sequence_length]
Returns: torch.Tensor: A FloatTensor of shape [batch_size, decoding_sequence, hidden_size], which contains decoder’s hidden states.
- _build_decoder_input(ids: Tensor, nums: Tensor)[source]¶
Compute input of the decoder.
- Parameters
ids (torch.Tensor) – LongTensor containing op tokens. Shape: [batch_size, equation_length]
nums (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape: [batch_size, num_size, hidden_size],
- Returns
A FloatTensor representing input vector. Shape [batch_size, equation_length, hidden_size].
- Return type
torch.Tensor
- _build_word_embed(ids: Tensor, nums: Tensor)[source]¶
Build Op embedding
- Parameters
ids (torch.Tensor) – LongTensor containing source-content information of operands. Shape [batch_size, equation_length].
nums (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape [batch_size, num_size, hidden_size].
- Returns
A FloatTensor representing op embedding vector. Shape [batch_size, equation_length, hidden_size]
- Return type
torch.Tensor
- _forward_single(text: Optional[Tensor] = None, text_pad: Optional[Tensor] = None, text_num: Optional[Tensor] = None, text_numpad: Optional[Tensor] = None, equation: Optional[Tensor] = None)[source]¶
Forward computation of a single beam
- Parameters
text (torch.Tensor) – FloatTensor containing encoder’s hidden states e_i. Shape [batch_size, input_sequence_length, input_embedding_size].
text_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the input sequence. Shape [batch_size, input_sequence_length]
text_num (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape: [batch_size, num_size, input_embedding_size].
equation (torch.Tensor) – LongTensor containing index-type information of an operator and its operands. Shape: [batch_size, equation_length, 1+2*arity_size].
- Returns
- Dictionary of followings
’_out’: Decoder’s hidden states. FloatTensor with shape [batch_size, equation_length, hidden_size].
- Return type
Dict[str, torch.Tensor]
- pos_factor¶
Decoding layer
- training: bool¶
- class mwptoolkit.module.Decoder.ept_decoder.Squeeze(dim: int = -1)[source]¶
Bases:
Module
Layer class for squeezing a dimension
Layer class for squeezing a dimension
- Parameters
dim (int) – Dimension to be squeezed, -1 by default.
- extra_repr()[source]¶
Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(tensor: Tensor)[source]¶
Do squeezing
- Parameters
tensor (torch.Tensor) – FloatTensor to be squeezed.
- Returns
Squeezed result.
- Return type
torch.FloatTensor
- training: bool¶
- class mwptoolkit.module.Decoder.ept_decoder.VanillaOpTransformer(config)[source]¶
Bases:
OpDecoderModel
The vanilla Transformer model
Initiate Equation Builder instance
- Parameters
config (ModelConfig) – Configuration of this model
- _build_target_dict(equation, num_pad=None)[source]¶
Build dictionary of target matrices.
- Returns
- Dictionary of target values
’op’: Index of next op tokens. LongTensor with shape [batch_size, equation_length].
- Return type
Dict[str, torch.Tensor]
- _build_word_embed(ids: Tensor, nums: Tensor)[source]¶
Build Op embedding
- Parameters
ids (torch.Tensor) – LongTensor containing source-content information of operands. Shape [batch_size, equation_length].
nums (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape [batch_size, num_size, hidden_size].
- Returns
A FloatTensor representing op embedding vector. Shape [batch_size, equation_length, hidden_size].
- Return type
torch.Tensor
- _forward_single(text: Optional[Tensor] = None, text_pad: Optional[Tensor] = None, text_num: Optional[Tensor] = None, text_numpad: Optional[Tensor] = None, equation: Optional[Tensor] = None)[source]¶
Forward computation of a single beam
- Parameters
text (torch.Tensor) – FloatTensor containing encoder’s hidden states. Shape [batch_size, input_sequence_length, input_embedding_size].
text_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the input sequence. Shape [batch_size, input_sequence_length]
text_num (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape: [batch_size, num_size, input_embedding_size].
equation (torch.Tensor) – LongTensor containing index-type information of an operator and its operands. Shape: [batch_size, equation_length].
- Returns
- Dictionary of followings
’op’: Log probability of next op tokens. FloatTensor with shape [batch_size, equation_length, operator_size].
- Return type
Dict[str, torch.Tensor]
- property required_field: str¶
str :return: Name of required field type to process
- Type
rtype
- softmax¶
Initialize weights
- training: bool¶
- mwptoolkit.module.Decoder.ept_decoder.apply_across_dim(function, dim=1, shared_keys=None, **tensors)[source]¶
Apply a function repeatedly for each tensor slice through the given dimension. For example, we have tensor [batch_size, X, input_sequence_length] and dim = 1, then we will concatenate the following matrices on dim=1. - function([:, 0, :]) - function([:, 1, :]) - … - function([:, X-1, :]).
- Parameters
function (function) – Function to apply.
dim (int) – Dimension through which we’ll apply function. (1 by default)
shared_keys (set) – Set of keys representing tensors to be shared. (None by default)
tensors (torch.Tensor) – Keyword arguments of tensors to compute. Dimension should >= dim.
- Returns
Dictionary of tensors, whose keys are corresponding to the output of the function.
- Return type
Dict[str, torch.Tensor]
- mwptoolkit.module.Decoder.ept_decoder.apply_module_dict(modules: ModuleDict, encoded: Tensor, **kwargs)[source]¶
Predict next entry using given module and equation.
- Parameters
modules (nn.ModuleDict) – Dictionary of modules to be applied. Modules will be applied with ascending order of keys. We expect three types of modules: nn.Linear, nn.LayerNorm and MultiheadAttention.
encoded (torch.Tensor) – Float Tensor that represents encoded vectors. Shape [batch_size, equation_length, hidden_size].
key_value (torch.Tensor) – Float Tensor that represents key and value vectors when computing attention. Shape [batch_size, key_size, hidden_size].
key_ignorance_mask (torch.Tensor) – Bool Tensor whose True values at (b, k) make attention layer ignore k-th key on b-th item in the batch. Shape [batch_size, key_size].
attention_mask (torch.BoolTensor) – Bool Tensor whose True values at (t, k) make attention layer ignore k-th key when computing t-th query. Shape [equation_length, key_size].
- Returns
Float Tensor that indicates the scores under given information. Shape will be [batch_size, equation_length, ?]
- Return type
torch.Tensor
- mwptoolkit.module.Decoder.ept_decoder.get_embedding_without_pad(embedding: Union[Embedding, Tensor], tokens: Tensor, ignore_index=-1)[source]¶
Get embedding vectors of given token tensor with ignored indices are zero-filled.
- Parameters
embedding (nn.Embedding) – An embedding instance
tokens (torch.Tensor) – A Long Tensor to build embedding vectors.
ignore_index (int) – Index to be ignored. PAD_ID by default.
- Returns
Embedding vector of given token tensor.
- Return type
torch.Tensor
- mwptoolkit.module.Decoder.ept_decoder.mask_forward(sz: int, diagonal: int = 1)[source]¶
Generate a mask that ignores future words. Each (i, j)-entry will be True if j >= i + diagonal
- Parameters
sz (int) – Length of the sequence.
diagonal (int) – Amount of shift for diagonal entries.
- Returns
Mask tensor with shape [sz, sz].
- Return type
torch.Tensor