mwptoolkit.module.Decoder.ept_decoder

class mwptoolkit.module.Decoder.ept_decoder.AveragePooling(dim: int = -1, keepdim: bool = False)[source]

Bases: Module

Layer class for computing mean of a sequence

Layer class for computing mean of a sequence

Parameters
  • dim (int) – Dimension to be averaged. -1 by default.

  • keepdim (bool) – True if you want to keep averaged dimensions. False by default.

extra_repr()[source]

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(tensor: Tensor)[source]

Do average pooling over a sequence

Parameters

tensor (torch.Tensor) – FloatTensor to be averaged.

Returns

Averaged result.

Return type

torch.FloatTensor

training: bool
class mwptoolkit.module.Decoder.ept_decoder.DecoderModel(config)[source]

Bases: Module

Base model for equation generation/classification (Abstract class)

Initiate Equation Builder instance

Parameters

config (ModelConfig) – Configuration of this model

_build_target_dict(**kwargs) Dict[str, Tensor][source]

Build dictionary of target matrices.

Return type

Dict[str, torch.Tensor]

Returns

Dictionary of target values

_forward_single(**kwargs) Dict[str, Tensor][source]

Forward computation of a single beam

Return type

Dict[str, torch.Tensor]

Returns

Dictionary of computed values

_init_weights(module: Module)[source]

Initialize weights

Parameters

module (nn.Module) – Module to be initialized.

forward(text: Optional[Tensor] = None, text_pad: Optional[Tensor] = None, text_num: Optional[Tensor] = None, text_numpad: Optional[Tensor] = None, equation: Optional[Tensor] = None, beam: int = 1, max_len: int = 128, function_arities: Optional[Dict[int, int]] = None)[source]

Forward computation of decoder model

Returns

Dictionary of tensors.

If this model is currently on training phase, values will be accuracy or loss tensors Otherwise, values will be tensors representing predicted distribution of output

Return type

Dict[str, torch.Tensor]

init_factor()[source]
Returns

Standard deviation of normal distribution that will be used for initializing weights.

Return type

float

property is_expression_type: bool

bool :return: True if this model requires Expression type sequence

Type

rtype

property required_field: str

str :return: Name of required field type to process

Type

rtype

training: bool
class mwptoolkit.module.Decoder.ept_decoder.ExpressionDecoderModel(config, out_opsym2idx, out_idx2opsym, out_consym2idx, out_idx2consym)[source]

Bases: DecoderModel

Decoding model that generates expression sequences (Abstract class)

Initiate Equation Builder instance

Parameters

config (ModelConfig) – Configuration of this model

_build_decoder_context(embedding: Tensor, embedding_pad: Optional[Tensor] = None, text: Optional[Tensor] = None, text_pad: Optional[Tensor] = None)[source]

Compute decoder’s hidden state vectors

Parameters
  • embedding (torch.Tensor) – FloatTensor containing input vectors. Shape [batch_size, equation_length, hidden_size],

  • embedding_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the decoding sequence, Shape [batch_size, equation_length]

  • text (torch.Tensor) – FloatTensor containing encoder’s hidden states. Shape [batch_size, input_sequence_length, hidden_size].

  • text_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the input sequence. Shape [batch_size, input_sequence_length]

Returns

A FloatTensor of shape [batch_size, equation_length, hidden_size], which contains decoder’s hidden states.

Return type

torch.Tensor

_build_decoder_input(ids: Tensor, nums: Tensor)[source]

Compute input of the decoder

Parameters
  • ids (torch.Tensor) – LongTensor containing index-type information of an operator and its operands. Shape: [batch_size, equation_length, 1+2*arity_size]

  • nums (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape: [batch_size, num_size, hidden_size].

Returns

A FloatTensor representing input vector. Shape [batch_size, equation_length, hidden_size].

Return type

torch.Tensor

_build_operand_embed(ids: Tensor, mem_pos: Tensor, nums: Tensor) Tensor[source]

Build operand embedding a_ij in the paper.

Parameters
  • ids (torch.Tensor) – LongTensor containing index-type information of operands. (This corresponds to a_ij in the paper)

  • mem_pos (torch.Tensor) – FloatTensor containing positional encoding used so far. (i.e. PE(.) in the paper)

  • nums (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. (i.e. e_{a_ij} in the paper)

Return type

torch.Tensor

Returns

A FloatTensor representing operand embedding vector a_ij in Equation 3, 4, 5

_forward_single(text: Optional[Tensor] = None, text_pad: Optional[Tensor] = None, text_num: Optional[Tensor] = None, text_numpad: Optional[Tensor] = None, equation: Optional[Tensor] = None)[source]

Forward computation of a single beam

Parameters
  • text (torch.Tensor) – FloatTensor containing encoder’s hidden states. Shape [batch_size, input_sequence_length, hidden_size].

  • text_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the input sequence. Shape [batch_size, input_sequence_length]

  • text_num (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape: [batch_size, num_size, hidden_size].

  • equation (torch.Tensor) – LongTensor containing index-type information of an operator and its operands. Shape: [batch_size, equation_length, 1+2*arity_size].

Returns

Dictionary of followings

’operator’: Log probability of next operators. FloatTensor with shape [batch_size, equation_length, operator_size]. ‘_out’: Decoder’s hidden states. FloatTensor with shape [batch_size, equation_length, hidden_size]. ‘_not_usable’: Indicating positions that corresponding output values are not usable in the operands. BoolTensor with Shape [batch_size, equation_length].

Return type

Dict[str, torch.Tensor]

embed_to_hidden

Transformer layer

function_arities

Embedding layers

operand_norm

Linear Transformation

operand_source_embedding

Scalar parameters

operand_source_factor

Layer Normalizations

shared_decoder_layer

Output layer

training: bool
class mwptoolkit.module.Decoder.ept_decoder.ExpressionPointerTransformer(config, out_opsym2idx, out_idx2opsym, out_consym2idx, out_idx2consym)[source]

Bases: ExpressionDecoderModel

The EPT model

Initiate Equation Builder instance

Parameters

config (ModelConfig) – Configuration of this model

_build_attention_keys(num: Tensor, mem: Tensor, num_pad: Optional[Tensor] = None, mem_pad: Optional[Tensor] = None)[source]

Generate Attention Keys by concatenating all items.

Parameters
  • num (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape [batch_size, num_size, hidden_size].

  • mem (torch.Tensor) – FloatTensor containing decoder’s hidden states corresponding to prior expression outputs. Shape [batch_size, equation_length, hidden_size].

  • num_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the number sequence. Shape [batch_size, num_size]

  • mem_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the target expression sequence. Shape [batch_size, equation_length]

Returns

Triple of Tensors
  • [0] Keys (A_ij in the paper). Shape [batch_size, constant_size+num_size+equation_length, hidden_size], where C = size of constant vocabulary.

  • [1] Mask for positions that should be ignored in keys. Shape [batch_size, C+num_size+equation_length]

  • [2] Forward Attention Mask to ignore future tokens in the expression sequence. Shape [equation_length, C+num_size+equation_length]

Return type

Tuple[torch.Tensor, torch.Tensor, torch.Tensor]

_build_operand_embed(ids: Tensor, mem_pos: Tensor, nums: Tensor)[source]

Build operand embedding.

Parameters
  • ids (torch.Tensor) – LongTensor containing source-content information of operands. Shape [batch_size, equation_length, 1+2*arity_size].

  • mem_pos (torch.Tensor) – FloatTensor containing positional encoding used so far. Shape [batch_size, equation_length, hidden_size].

  • nums (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape [batch_size, num_size, hidden_size].

Returns

A FloatTensor representing operand embedding vector. Shape [batch_size, equation_length, arity_size, hidden_size]

Return type

torch.Tensor

_build_target_dict(equation, num_pad=None)[source]

Build dictionary of target matrices.

Returns

Dictionary of target values

’operator’: Index of next operators. LongTensor with shape [batch_size, equation_length]. ‘operand_J’: Index of next J-th operands. LongTensor with shape [batch_size, equation_length].

Return type

Dict[str, torch.Tensor]

_forward_single(text: Optional[Tensor] = None, text_pad: Optional[Tensor] = None, text_num: Optional[Tensor] = None, text_numpad: Optional[Tensor] = None, equation: Optional[Tensor] = None)[source]

Forward computation of a single beam

Parameters
  • text (torch.Tensor) – FloatTensor containing encoder’s hidden states. Shape [batch_size, input_sequence_length, hidden_size].

  • text_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the input sequence. Shape [batch_size, input_sequence_length]

  • text_num (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape: [batch_size, num_size, hidden_size].

  • text_numpad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the number sequence. Shape [batch_size, num_size]

  • equation (torch.Tensor) – LongTensor containing index-type information of an operator and its operands. Shape: [batch_size, equation_length, 1+2*arity_size].

Returns

Dictionary of followings

’operator’: Log probability of next operators.FloatTensor with shape [batch_size, equation_length, operator_size]. ‘operand_J’: Log probability of next J-th operands. FloatTensor with shape [batch_size, equation_length, operand_size].

Return type

Dict[str, torch.Tensor]

constant_word_embedding

Output layer

operand_out

Initialize weights

property required_field: str

str :return: Name of required field type to process

Type

rtype

training: bool
class mwptoolkit.module.Decoder.ept_decoder.ExpressionTransformer(config, out_opsym2idx, out_idx2opsym, out_consym2idx, out_idx2consym)[source]

Bases: ExpressionDecoderModel

Vanilla Transformer + Expression (The second ablated model)

Initiate Equation Builder instance

Parameters

config (ModelConfig) – Configuration of this model

_build_operand_embed(ids: Tensor, mem_pos: Tensor, nums: Tensor)[source]

Build operand embedding.

Parameters
  • ids (torch.Tensor) – LongTensor containing source-content information of operands. Shape [batch_size, equation_length, 1+2*arity_size].

  • mem_pos (torch.Tensor) – FloatTensor containing positional encoding used so far. Shape [batch_size, equation_length, hidden_size], where hidden_size = dimension of hidden state

  • nums (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape [batch_size, num_size, hidden_size].

Returns

A FloatTensor representing operand embedding vector. Shape [batch_size, equation_length, arity_size, hidden_size]

Return type

torch.Tensor

_build_target_dict(equation, num_pad=None)[source]

Build dictionary of target matrices.

Returns

Dictionary of target values

’operator’: Index of next operators. LongTensor with shape [batch_size, equation_length]. ‘operand_J’: Index of next J-th operands. LongTensor with shape [batch_size, equation_length].

Return type

Dict[str, torch.Tensor]

_forward_single(text: Optional[Tensor] = None, text_pad: Optional[Tensor] = None, text_num: Optional[Tensor] = None, text_numpad: Optional[Tensor] = None, equation: Optional[Tensor] = None)[source]

Forward computation of a single beam

Parameters
  • text (torch.Tensor) – FloatTensor containing encoder’s hidden states. Shape [batch_size, input_sequence_length, hidden_size].

  • text_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the input sequence. Shape [batch_size, input_sequence_length]

  • text_num (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text.Shape: [batch_size, num_size, hidden_size].

  • text_numpad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the number sequence. Shape [batch_size, num_size]

  • equation (torch.Tensor) – LongTensor containing index-type information of an operator and its operands. Shape: [batch_size, equation_length, 1+2*arity_size].

Returns

Dictionary of followings

’operator’: Log probability of next operators. FloatTensor with shape [batch_size, equation_length, operator_size], where operator_size = size of operator vocabulary. ‘operand_J’: Log probability of next J-th operands.FloatTensor with shape [batch_size, equation_length, operand_size].

Return type

Dict[str, torch.Tensor]

operand_out

Initialize weights

operand_word_embedding

Output layer

property required_field: str

str :return: Name of required field type to process

Type

rtype

training: bool
class mwptoolkit.module.Decoder.ept_decoder.LogSoftmax(dim: Optional[int] = None)[source]

Bases: LogSoftmax

LogSoftmax layer that can handle infinity values.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

dim: Optional[int]
forward(tensor: Tensor)[source]

Compute log(softmax(tensor))

Parameters

torch.Tensor (tensor) – FloatTensor whose log-softmax value will be computed

Returns

LogSoftmax result.

Return type

torch.FloatTensor

class mwptoolkit.module.Decoder.ept_decoder.OpDecoderModel(config)[source]

Bases: DecoderModel

Decoding model that generates Op(Operator/Operand) sequences (Abstract class)

Initiate Equation Builder instance

Parameters

config (ModelConfig) – Configuration of this model

_build_decoder_context(embedding: Tensor, embedding_pad: Optional[Tensor] = None, text: Optional[Tensor] = None, text_pad: Optional[Tensor] = None)[source]

Compute decoder’s hidden state vectors.

Parameters
  • embedding (torch.Tensor) – FloatTensor containing input vectors. Shape [batch_size, decoding_sequence, input_embedding_size].

  • embedding_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the decoding sequence. Shape [batch_size, decoding_sequence]

  • text (torch.Tensor) – FloatTensor containing encoder’s hidden states. Shape [batch_size, input_sequence_length, input_embedding_size].

  • text_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the input sequence. Shape [batch_size, input_sequence_length]

Returns: torch.Tensor: A FloatTensor of shape [batch_size, decoding_sequence, hidden_size], which contains decoder’s hidden states.

_build_decoder_input(ids: Tensor, nums: Tensor)[source]

Compute input of the decoder.

Parameters
  • ids (torch.Tensor) – LongTensor containing op tokens. Shape: [batch_size, equation_length]

  • nums (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape: [batch_size, num_size, hidden_size],

Returns

A FloatTensor representing input vector. Shape [batch_size, equation_length, hidden_size].

Return type

torch.Tensor

_build_word_embed(ids: Tensor, nums: Tensor)[source]

Build Op embedding

Parameters
  • ids (torch.Tensor) – LongTensor containing source-content information of operands. Shape [batch_size, equation_length].

  • nums (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape [batch_size, num_size, hidden_size].

Returns

A FloatTensor representing op embedding vector. Shape [batch_size, equation_length, hidden_size]

Return type

torch.Tensor

_forward_single(text: Optional[Tensor] = None, text_pad: Optional[Tensor] = None, text_num: Optional[Tensor] = None, text_numpad: Optional[Tensor] = None, equation: Optional[Tensor] = None)[source]

Forward computation of a single beam

Parameters
  • text (torch.Tensor) – FloatTensor containing encoder’s hidden states e_i. Shape [batch_size, input_sequence_length, input_embedding_size].

  • text_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the input sequence. Shape [batch_size, input_sequence_length]

  • text_num (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape: [batch_size, num_size, input_embedding_size].

  • equation (torch.Tensor) – LongTensor containing index-type information of an operator and its operands. Shape: [batch_size, equation_length, 1+2*arity_size].

Returns

Dictionary of followings

’_out’: Decoder’s hidden states. FloatTensor with shape [batch_size, equation_length, hidden_size].

Return type

Dict[str, torch.Tensor]

pos_factor

Decoding layer

training: bool
class mwptoolkit.module.Decoder.ept_decoder.Squeeze(dim: int = -1)[source]

Bases: Module

Layer class for squeezing a dimension

Layer class for squeezing a dimension

Parameters

dim (int) – Dimension to be squeezed, -1 by default.

extra_repr()[source]

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(tensor: Tensor)[source]

Do squeezing

Parameters

tensor (torch.Tensor) – FloatTensor to be squeezed.

Returns

Squeezed result.

Return type

torch.FloatTensor

training: bool
class mwptoolkit.module.Decoder.ept_decoder.VanillaOpTransformer(config)[source]

Bases: OpDecoderModel

The vanilla Transformer model

Initiate Equation Builder instance

Parameters

config (ModelConfig) – Configuration of this model

_build_target_dict(equation, num_pad=None)[source]

Build dictionary of target matrices.

Returns

Dictionary of target values

’op’: Index of next op tokens. LongTensor with shape [batch_size, equation_length].

Return type

Dict[str, torch.Tensor]

_build_word_embed(ids: Tensor, nums: Tensor)[source]

Build Op embedding

Parameters
  • ids (torch.Tensor) – LongTensor containing source-content information of operands. Shape [batch_size, equation_length].

  • nums (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape [batch_size, num_size, hidden_size].

Returns

A FloatTensor representing op embedding vector. Shape [batch_size, equation_length, hidden_size].

Return type

torch.Tensor

_forward_single(text: Optional[Tensor] = None, text_pad: Optional[Tensor] = None, text_num: Optional[Tensor] = None, text_numpad: Optional[Tensor] = None, equation: Optional[Tensor] = None)[source]

Forward computation of a single beam

Parameters
  • text (torch.Tensor) – FloatTensor containing encoder’s hidden states. Shape [batch_size, input_sequence_length, input_embedding_size].

  • text_pad (torch.Tensor) – BoolTensor, whose values are True if corresponding position is PAD in the input sequence. Shape [batch_size, input_sequence_length]

  • text_num (torch.Tensor) – FloatTensor containing encoder’s hidden states corresponding to numbers in the text. Shape: [batch_size, num_size, input_embedding_size].

  • equation (torch.Tensor) – LongTensor containing index-type information of an operator and its operands. Shape: [batch_size, equation_length].

Returns

Dictionary of followings

’op’: Log probability of next op tokens. FloatTensor with shape [batch_size, equation_length, operator_size].

Return type

Dict[str, torch.Tensor]

property required_field: str

str :return: Name of required field type to process

Type

rtype

softmax

Initialize weights

training: bool
mwptoolkit.module.Decoder.ept_decoder.apply_across_dim(function, dim=1, shared_keys=None, **tensors)[source]

Apply a function repeatedly for each tensor slice through the given dimension. For example, we have tensor [batch_size, X, input_sequence_length] and dim = 1, then we will concatenate the following matrices on dim=1. - function([:, 0, :]) - function([:, 1, :]) - … - function([:, X-1, :]).

Parameters
  • function (function) – Function to apply.

  • dim (int) – Dimension through which we’ll apply function. (1 by default)

  • shared_keys (set) – Set of keys representing tensors to be shared. (None by default)

  • tensors (torch.Tensor) – Keyword arguments of tensors to compute. Dimension should >= dim.

Returns

Dictionary of tensors, whose keys are corresponding to the output of the function.

Return type

Dict[str, torch.Tensor]

mwptoolkit.module.Decoder.ept_decoder.apply_module_dict(modules: ModuleDict, encoded: Tensor, **kwargs)[source]

Predict next entry using given module and equation.

Parameters
  • modules (nn.ModuleDict) – Dictionary of modules to be applied. Modules will be applied with ascending order of keys. We expect three types of modules: nn.Linear, nn.LayerNorm and MultiheadAttention.

  • encoded (torch.Tensor) – Float Tensor that represents encoded vectors. Shape [batch_size, equation_length, hidden_size].

  • key_value (torch.Tensor) – Float Tensor that represents key and value vectors when computing attention. Shape [batch_size, key_size, hidden_size].

  • key_ignorance_mask (torch.Tensor) – Bool Tensor whose True values at (b, k) make attention layer ignore k-th key on b-th item in the batch. Shape [batch_size, key_size].

  • attention_mask (torch.BoolTensor) – Bool Tensor whose True values at (t, k) make attention layer ignore k-th key when computing t-th query. Shape [equation_length, key_size].

Returns

Float Tensor that indicates the scores under given information. Shape will be [batch_size, equation_length, ?]

Return type

torch.Tensor

mwptoolkit.module.Decoder.ept_decoder.get_embedding_without_pad(embedding: Union[Embedding, Tensor], tokens: Tensor, ignore_index=-1)[source]

Get embedding vectors of given token tensor with ignored indices are zero-filled.

Parameters
  • embedding (nn.Embedding) – An embedding instance

  • tokens (torch.Tensor) – A Long Tensor to build embedding vectors.

  • ignore_index (int) – Index to be ignored. PAD_ID by default.

Returns

Embedding vector of given token tensor.

Return type

torch.Tensor

mwptoolkit.module.Decoder.ept_decoder.mask_forward(sz: int, diagonal: int = 1)[source]

Generate a mask that ignores future words. Each (i, j)-entry will be True if j >= i + diagonal

Parameters
  • sz (int) – Length of the sequence.

  • diagonal (int) – Amount of shift for diagonal entries.

Returns

Mask tensor with shape [sz, sz].

Return type

torch.Tensor