mwptoolkit.module.Decoder.ept_decoder¶

class mwptoolkit.module.Decoder.ept_decoder.AveragePooling(dim: int = -1, keepdim: bool = False)[source]¶

Bases: Module

Layer class for computing mean of a sequence

Parameters

dim (int) – Dimension to be averaged. -1 by default.
keepdim (bool) – True if you want to keep averaged dimensions. False by default.

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(tensor: Tensor)[source]¶

Do average pooling over a sequence

Parameters: tensor (torch.Tensor) – FloatTensor to be averaged.
Returns: Averaged result.
Return type: torch.FloatTensor

training: bool¶

class mwptoolkit.module.Decoder.ept_decoder.DecoderModel(config)[source]¶

Bases: Module

Base model for equation generation/classification (Abstract class)

Initiate Equation Builder instance

Parameters: config (ModelConfig) – Configuration of this model

_build_target_dict(**kwargs) → Dict[str, Tensor][source]¶

Build dictionary of target matrices.

Return type: Dict[str, torch.Tensor]
Returns: Dictionary of target values

_forward_single(**kwargs) → Dict[str, Tensor][source]¶

Forward computation of a single beam

Return type: Dict[str, torch.Tensor]
Returns: Dictionary of computed values

_init_weights(module: Module)[source]¶

Initialize weights

Parameters: module (nn.Module) – Module to be initialized.

forward(text: Optional[Tensor] = None, text_pad: Optional[Tensor] = None, text_num: Optional[Tensor] = None, text_numpad: Optional[Tensor] = None, equation: Optional[Tensor] = None, beam: int = 1, max_len: int = 128, function_arities: Optional[Dict[int, int]] = None)[source]¶

Forward computation of decoder model

Returns

Dictionary of tensors.: If this model is currently on training phase, values will be accuracy or loss tensors Otherwise, values will be tensors representing predicted distribution of output

Return type

Dict[str, torch.Tensor]

init_factor()[source]¶

Returns: Standard deviation of normal distribution that will be used for initializing weights.
Return type: float

property is_expression_type: bool¶

bool :return: True if this model requires Expression type sequence

Type: rtype

property required_field: str¶

str :return: Name of required field type to process

Type: rtype

training: bool¶

class mwptoolkit.module.Decoder.ept_decoder.ExpressionDecoderModel(config, out_opsym2idx, out_idx2opsym, out_consym2idx, out_idx2consym)[source]¶

Bases: DecoderModel

Decoding model that generates expression sequences (Abstract class)

Initiate Equation Builder instance

Parameters: config (ModelConfig) – Configuration of this model

_build_decoder_context(embedding: Tensor, embedding_pad: Optional[Tensor] = None, text: Optional[Tensor] = None, text_pad: Optional[Tensor] = None)[source]¶

Compute decoder’s hidden state vectors

Parameters

embedding (torch.Tensor) – FloatTensor containing input vectors. Shape [batch_size, equation_length, hidden_size],
embedding_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the decoding sequence, Shape [batch_size, equation_length]
text (torch.Tensor) – FloatTensor containing encoder’s hidden states. Shape [batch_size, input_sequence_length, hidden_size].
text_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the input sequence. Shape [batch_size, input_sequence_length]

Returns

A FloatTensor of shape [batch_size, equation_length, hidden_size], which contains decoder’s hidden states.

Return type

torch.Tensor

_build_decoder_input(ids: Tensor, nums: Tensor)[source]¶

Compute input of the decoder

Parameters

ids (torch.Tensor) – LongTensor containing index-type information of an operator and its operands. Shape: [batch_size, equation_length, 1+2*arity_size]
nums (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape: [batch_size, num_size, hidden_size].

Returns

A FloatTensor representing input vector. Shape [batch_size, equation_length, hidden_size].

Return type

torch.Tensor

_build_operand_embed(ids: Tensor, mem_pos: Tensor, nums: Tensor) → Tensor[source]¶

Build operand embedding a_ij in the paper.

Parameters

ids (torch.Tensor) – LongTensor containing index-type information of operands. (This corresponds to a_ij in the paper)
mem_pos (torch.Tensor) – FloatTensor containing positional encoding used so far. (i.e. PE(.) in the paper)
nums (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. (i.e. e_{a_ij} in the paper)

Return type

torch.Tensor

Returns

A FloatTensor representing operand embedding vector a_ij in Equation 3, 4, 5

_forward_single(text: Optional[Tensor] = None, text_pad: Optional[Tensor] = None, text_num: Optional[Tensor] = None, text_numpad: Optional[Tensor] = None, equation: Optional[Tensor] = None)[source]¶

Forward computation of a single beam

Parameters

text (torch.Tensor) – FloatTensor containing encoder’s hidden states. Shape [batch_size, input_sequence_length, hidden_size].
text_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the input sequence. Shape [batch_size, input_sequence_length]
text_num (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape: [batch_size, num_size, hidden_size].
equation (torch.Tensor) – LongTensor containing index-type information of an operator and its operands. Shape: [batch_size, equation_length, 1+2*arity_size].

Returns

Dictionary of followings: ’operator’: Log probability of next operators. FloatTensor with shape [batch_size, equation_length, operator_size]. ‘_out’: Decoder’s hidden states. FloatTensor with shape [batch_size, equation_length, hidden_size]. ‘_not_usable’: Indicating positions that corresponding output values are not usable in the operands. BoolTensor with Shape [batch_size, equation_length].

Return type

Dict[str, torch.Tensor]

embed_to_hidden¶: Transformer layer

function_arities¶: Embedding layers

operand_norm¶: Linear Transformation

operand_source_embedding¶: Scalar parameters

operand_source_factor¶: Layer Normalizations

shared_decoder_layer¶: Output layer

training: bool¶

class mwptoolkit.module.Decoder.ept_decoder.ExpressionPointerTransformer(config, out_opsym2idx, out_idx2opsym, out_consym2idx, out_idx2consym)[source]¶

Bases: ExpressionDecoderModel

The EPT model

Initiate Equation Builder instance

Parameters: config (ModelConfig) – Configuration of this model

_build_attention_keys(num: Tensor, mem: Tensor, num_pad: Optional[Tensor] = None, mem_pad: Optional[Tensor] = None)[source]¶

Generate Attention Keys by concatenating all items.

Parameters

num (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape [batch_size, num_size, hidden_size].
mem (torch.Tensor) – FloatTensor containing decoder’s hidden states corresponding to prior expression outputs. Shape [batch_size, equation_length, hidden_size].
num_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the number sequence. Shape [batch_size, num_size]
mem_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the target expression sequence. Shape [batch_size, equation_length]

Returns

Triple of Tensors

[0] Keys (A_ij in the paper). Shape [batch_size, constant_size+num_size+equation_length, hidden_size], where C = size of constant vocabulary.
[1] Mask for positions that should be ignored in keys. Shape [batch_size, C+num_size+equation_length]
[2] Forward Attention Mask to ignore future tokens in the expression sequence. Shape [equation_length, C+num_size+equation_length]

Return type

Tuple[torch.Tensor, torch.Tensor, torch.Tensor]

_build_operand_embed(ids: Tensor, mem_pos: Tensor, nums: Tensor)[source]¶

Build operand embedding.

Parameters

ids (torch.Tensor) – LongTensor containing source-content information of operands. Shape [batch_size, equation_length, 1+2*arity_size].
mem_pos (torch.Tensor) – FloatTensor containing positional encoding used so far. Shape [batch_size, equation_length, hidden_size].
nums (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape [batch_size, num_size, hidden_size].

Returns

A FloatTensor representing operand embedding vector. Shape [batch_size, equation_length, arity_size, hidden_size]

Return type

torch.Tensor

_build_target_dict(equation, num_pad=None)[source]¶

Build dictionary of target matrices.

Returns

Dictionary of target values: ’operator’: Index of next operators. LongTensor with shape [batch_size, equation_length]. ‘operand_J’: Index of next J-th operands. LongTensor with shape [batch_size, equation_length].

Return type

Dict[str, torch.Tensor]

_forward_single(text: Optional[Tensor] = None, text_pad: Optional[Tensor] = None, text_num: Optional[Tensor] = None, text_numpad: Optional[Tensor] = None, equation: Optional[Tensor] = None)[source]¶

Forward computation of a single beam

Parameters

text (torch.Tensor) – FloatTensor containing encoder’s hidden states. Shape [batch_size, input_sequence_length, hidden_size].
text_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the input sequence. Shape [batch_size, input_sequence_length]
text_num (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape: [batch_size, num_size, hidden_size].
text_numpad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the number sequence. Shape [batch_size, num_size]
equation (torch.Tensor) – LongTensor containing index-type information of an operator and its operands. Shape: [batch_size, equation_length, 1+2*arity_size].

Returns

Dictionary of followings: ’operator’: Log probability of next operators.FloatTensor with shape [batch_size, equation_length, operator_size]. ‘operand_J’: Log probability of next J-th operands. FloatTensor with shape [batch_size, equation_length, operand_size].

Return type

Dict[str, torch.Tensor]

constant_word_embedding¶: Output layer

operand_out¶: Initialize weights

property required_field: str¶

str :return: Name of required field type to process

Type: rtype

training: bool¶

class mwptoolkit.module.Decoder.ept_decoder.ExpressionTransformer(config, out_opsym2idx, out_idx2opsym, out_consym2idx, out_idx2consym)[source]¶

Bases: ExpressionDecoderModel

Vanilla Transformer + Expression (The second ablated model)

Initiate Equation Builder instance

Parameters: config (ModelConfig) – Configuration of this model

_build_operand_embed(ids: Tensor, mem_pos: Tensor, nums: Tensor)[source]¶

Build operand embedding.

Parameters

ids (torch.Tensor) – LongTensor containing source-content information of operands. Shape [batch_size, equation_length, 1+2*arity_size].
mem_pos (torch.Tensor) – FloatTensor containing positional encoding used so far. Shape [batch_size, equation_length, hidden_size], where hidden_size = dimension of hidden state
nums (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape [batch_size, num_size, hidden_size].

Returns

A FloatTensor representing operand embedding vector. Shape [batch_size, equation_length, arity_size, hidden_size]

Return type

torch.Tensor

_build_target_dict(equation, num_pad=None)[source]¶

Build dictionary of target matrices.

Returns

Dictionary of target values: ’operator’: Index of next operators. LongTensor with shape [batch_size, equation_length]. ‘operand_J’: Index of next J-th operands. LongTensor with shape [batch_size, equation_length].

Return type

Dict[str, torch.Tensor]

_forward_single(text: Optional[Tensor] = None, text_pad: Optional[Tensor] = None, text_num: Optional[Tensor] = None, text_numpad: Optional[Tensor] = None, equation: Optional[Tensor] = None)[source]¶

Forward computation of a single beam

Parameters

text (torch.Tensor) – FloatTensor containing encoder’s hidden states. Shape [batch_size, input_sequence_length, hidden_size].
text_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the input sequence. Shape [batch_size, input_sequence_length]
text_num (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text.Shape: [batch_size, num_size, hidden_size].
text_numpad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the number sequence. Shape [batch_size, num_size]
equation (torch.Tensor) – LongTensor containing index-type information of an operator and its operands. Shape: [batch_size, equation_length, 1+2*arity_size].

Returns

Dictionary of followings: ’operator’: Log probability of next operators. FloatTensor with shape [batch_size, equation_length, operator_size], where operator_size = size of operator vocabulary. ‘operand_J’: Log probability of next J-th operands.FloatTensor with shape [batch_size, equation_length, operand_size].

Return type

Dict[str, torch.Tensor]

operand_out¶: Initialize weights

operand_word_embedding¶: Output layer

property required_field: str¶

str :return: Name of required field type to process

Type: rtype

training: bool¶

class mwptoolkit.module.Decoder.ept_decoder.LogSoftmax(dim: Optional[int] = None)[source]¶

Bases: LogSoftmax

LogSoftmax layer that can handle infinity values.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

dim: Optional[int]¶

forward(tensor: Tensor)[source]¶

Compute log(softmax(tensor))

Parameters: torch.Tensor (tensor) – FloatTensor whose log-softmax value will be computed
Returns: LogSoftmax result.
Return type: torch.FloatTensor

class mwptoolkit.module.Decoder.ept_decoder.OpDecoderModel(config)[source]¶

Bases: DecoderModel

Decoding model that generates Op(Operator/Operand) sequences (Abstract class)

Initiate Equation Builder instance

Parameters: config (ModelConfig) – Configuration of this model

_build_decoder_context(embedding: Tensor, embedding_pad: Optional[Tensor] = None, text: Optional[Tensor] = None, text_pad: Optional[Tensor] = None)[source]¶

Compute decoder’s hidden state vectors.

Parameters

embedding (torch.Tensor) – FloatTensor containing input vectors. Shape [batch_size, decoding_sequence, input_embedding_size].
embedding_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the decoding sequence. Shape [batch_size, decoding_sequence]
text (torch.Tensor) – FloatTensor containing encoder’s hidden states. Shape [batch_size, input_sequence_length, input_embedding_size].
text_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the input sequence. Shape [batch_size, input_sequence_length]

Returns: torch.Tensor: A FloatTensor of shape [batch_size, decoding_sequence, hidden_size], which contains decoder’s hidden states.

_build_decoder_input(ids: Tensor, nums: Tensor)[source]¶

Compute input of the decoder.

Parameters

ids (torch.Tensor) – LongTensor containing op tokens. Shape: [batch_size, equation_length]
nums (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape: [batch_size, num_size, hidden_size],

Returns

A FloatTensor representing input vector. Shape [batch_size, equation_length, hidden_size].

Return type

torch.Tensor

_build_word_embed(ids: Tensor, nums: Tensor)[source]¶

Build Op embedding

Parameters

ids (torch.Tensor) – LongTensor containing source-content information of operands. Shape [batch_size, equation_length].
nums (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape [batch_size, num_size, hidden_size].

Returns

A FloatTensor representing op embedding vector. Shape [batch_size, equation_length, hidden_size]

Return type

torch.Tensor

_forward_single(text: Optional[Tensor] = None, text_pad: Optional[Tensor] = None, text_num: Optional[Tensor] = None, text_numpad: Optional[Tensor] = None, equation: Optional[Tensor] = None)[source]¶

Forward computation of a single beam

Parameters

text (torch.Tensor) – FloatTensor containing encoder’s hidden states e_i. Shape [batch_size, input_sequence_length, input_embedding_size].
text_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the input sequence. Shape [batch_size, input_sequence_length]
text_num (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape: [batch_size, num_size, input_embedding_size].
equation (torch.Tensor) – LongTensor containing index-type information of an operator and its operands. Shape: [batch_size, equation_length, 1+2*arity_size].

Returns

Dictionary of followings: ’_out’: Decoder’s hidden states. FloatTensor with shape [batch_size, equation_length, hidden_size].

Return type

Dict[str, torch.Tensor]

pos_factor¶: Decoding layer

training: bool¶

class mwptoolkit.module.Decoder.ept_decoder.Squeeze(dim: int = -1)[source]¶

Bases: Module

Layer class for squeezing a dimension

Parameters: dim (int) – Dimension to be squeezed, -1 by default.

extra_repr()[source]¶

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(tensor: Tensor)[source]¶

Do squeezing

Parameters: tensor (torch.Tensor) – FloatTensor to be squeezed.
Returns: Squeezed result.
Return type: torch.FloatTensor

training: bool¶

class mwptoolkit.module.Decoder.ept_decoder.VanillaOpTransformer(config)[source]¶

Bases: OpDecoderModel

The vanilla Transformer model

Initiate Equation Builder instance

Parameters: config (ModelConfig) – Configuration of this model

_build_target_dict(equation, num_pad=None)[source]¶

Build dictionary of target matrices.

Returns

Dictionary of target values: ’op’: Index of next op tokens. LongTensor with shape [batch_size, equation_length].

Return type

Dict[str, torch.Tensor]

_build_word_embed(ids: Tensor, nums: Tensor)[source]¶

Build Op embedding

Parameters

ids (torch.Tensor) – LongTensor containing source-content information of operands. Shape [batch_size, equation_length].
nums (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape [batch_size, num_size, hidden_size].

Returns

A FloatTensor representing op embedding vector. Shape [batch_size, equation_length, hidden_size].

Return type

torch.Tensor

_forward_single(text: Optional[Tensor] = None, text_pad: Optional[Tensor] = None, text_num: Optional[Tensor] = None, text_numpad: Optional[Tensor] = None, equation: Optional[Tensor] = None)[source]¶

Forward computation of a single beam

Parameters

text (torch.Tensor) – FloatTensor containing encoder’s hidden states. Shape [batch_size, input_sequence_length, input_embedding_size].
text_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the input sequence. Shape [batch_size, input_sequence_length]
text_num (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape: [batch_size, num_size, input_embedding_size].
equation (torch.Tensor) – LongTensor containing index-type information of an operator and its operands. Shape: [batch_size, equation_length].

Returns

Dictionary of followings: ’op’: Log probability of next op tokens. FloatTensor with shape [batch_size, equation_length, operator_size].

Return type

Dict[str, torch.Tensor]

property required_field: str¶

str :return: Name of required field type to process

Type: rtype

softmax¶: Initialize weights

training: bool¶

mwptoolkit.module.Decoder.ept_decoder.apply_across_dim(function, dim=1, shared_keys=None, **tensors)[source]¶

Apply a function repeatedly for each tensor slice through the given dimension. For example, we have tensor [batch_size, X, input_sequence_length] and dim = 1, then we will concatenate the following matrices on dim=1. - function([:, 0, :]) - function([:, 1, :]) - … - function([:, X-1, :]).

Parameters

function (function) – Function to apply.
dim (int) – Dimension through which we’ll apply function. (1 by default)
shared_keys (set) – Set of keys representing tensors to be shared. (None by default)
tensors (torch.Tensor) – Keyword arguments of tensors to compute. Dimension should >= dim.

Returns

Dictionary of tensors, whose keys are corresponding to the output of the function.

Return type

Dict[str, torch.Tensor]

mwptoolkit.module.Decoder.ept_decoder.apply_module_dict(modules: ModuleDict, encoded: Tensor, **kwargs)[source]¶

Predict next entry using given module and equation.

Parameters

modules (nn.ModuleDict) – Dictionary of modules to be applied. Modules will be applied with ascending order of keys. We expect three types of modules: nn.Linear, nn.LayerNorm and MultiheadAttention.
encoded (torch.Tensor) – Float Tensor that represents encoded vectors. Shape [batch_size, equation_length, hidden_size].
key_value (torch.Tensor) – Float Tensor that represents key and value vectors when computing attention. Shape [batch_size, key_size, hidden_size].
key_ignorance_mask (torch.Tensor) – Bool Tensor whose True values at (b, k) make attention layer ignore k-th key on b-th item in the batch. Shape [batch_size, key_size].
attention_mask (torch.BoolTensor) – Bool Tensor whose True values at (t, k) make attention layer ignore k-th key when computing t-th query. Shape [equation_length, key_size].

Returns

Float Tensor that indicates the scores under given information. Shape will be [batch_size, equation_length, ?]

Return type

torch.Tensor

mwptoolkit.module.Decoder.ept_decoder.get_embedding_without_pad(embedding: Union[Embedding, Tensor], tokens: Tensor, ignore_index=-1)[source]¶

Get embedding vectors of given token tensor with ignored indices are zero-filled.

Parameters

embedding (nn.Embedding) – An embedding instance
tokens (torch.Tensor) – A Long Tensor to build embedding vectors.
ignore_index (int) – Index to be ignored. PAD_ID by default.

Returns

Embedding vector of given token tensor.

Return type

torch.Tensor

mwptoolkit.module.Decoder.ept_decoder.mask_forward(sz: int, diagonal: int = 1)[source]¶

Generate a mask that ignores future words. Each (i, j)-entry will be True if j >= i + diagonal

Parameters

sz (int) – Length of the sequence.
diagonal (int) – Amount of shift for diagonal entries.

Returns

Mask tensor with shape [sz, sz].

Return type

torch.Tensor