mwptoolkit.utils.preprocess_tool.number_transfer

mwptoolkit.utils.preprocess_tool.number_transfer.get_num_pos(input_seq, mask_type, pattern)[source]
mwptoolkit.utils.preprocess_tool.number_transfer.num_transfer_alg514(data, mask_type, equ_split_symbol=';', vocab_level='word', word_lower=False)[source]
mwptoolkit.utils.preprocess_tool.number_transfer.num_transfer_draw(data, mask_type, equ_split_symbol=';', vocab_level='word', word_lower=False)[source]
mwptoolkit.utils.preprocess_tool.number_transfer.num_transfer_hmwp(data, mask_type, equ_split_symbol=';', vocab_level='word', word_lower=False)[source]
mwptoolkit.utils.preprocess_tool.number_transfer.num_transfer_multi(data, mask_type, equ_split_symbol=';', vocab_level='word', word_lower=False)[source]
mwptoolkit.utils.preprocess_tool.number_transfer.number_transfer(datas, dataset_name, task_type, mask_type, min_generate_keep, linear_dataset, equ_split_symbol=';', vocab_level='word', word_lower=False) Tuple[list, list, int, list][source]

number transfer

Parameters
  • datas (list) – dataset.

  • dataset_name (str) – dataset name.

  • task_type (str) – [single_equation | multi_equation], task type.

  • mask_type

  • min_generate_keep (int) – generate number that count greater than the value, will be kept in output symbols.

  • linear_dataset (bool) –

  • equ_split_symbol (str) – equation split symbol, in multiple-equation dataset, symbol to split equations, this symbol will be repalced with special token SpecialTokens.BRG

  • vocab_level (str) –

  • word_lower (bool) –

Returns

processed datas, generate number list, copy number, unk symbol list.

mwptoolkit.utils.preprocess_tool.number_transfer.number_transfer_ape200k(data, mask_type, linear, vocab_level='word', word_lower=False)[source]
mwptoolkit.utils.preprocess_tool.number_transfer.number_transfer_asdiv_a(data, mask_type, linear, vocab_level='word', word_lower=False)[source]
mwptoolkit.utils.preprocess_tool.number_transfer.number_transfer_math23k(data, mask_type, linear, vocab_level='word', word_lower=False)[source]
mwptoolkit.utils.preprocess_tool.number_transfer.number_transfer_mawps(data, mask_type, linear, vocab_level='word', word_lower=False)[source]
mwptoolkit.utils.preprocess_tool.number_transfer.number_transfer_mawps_single(data, mask_type, linear, vocab_level='word', word_lower=False)[source]
mwptoolkit.utils.preprocess_tool.number_transfer.number_transfer_single(data, mask_type, linear, vocab_level='word', word_lower=False)[source]
mwptoolkit.utils.preprocess_tool.number_transfer.number_transfer_svamp(data, mask_type, linear, vocab_level='word', word_lower=False)[source]
mwptoolkit.utils.preprocess_tool.number_transfer.seg_and_tag_ape200k(st, nums_fraction, nums)[source]
mwptoolkit.utils.preprocess_tool.number_transfer.seg_and_tag_asdiv_a(st, nums_fraction, nums)[source]
mwptoolkit.utils.preprocess_tool.number_transfer.seg_and_tag_hmwp(st, nums_fraction, nums)[source]
mwptoolkit.utils.preprocess_tool.number_transfer.seg_and_tag_math23k(st, nums_fraction, nums)[source]
mwptoolkit.utils.preprocess_tool.number_transfer.seg_and_tag_mawps(st, nums_fraction, nums)[source]
mwptoolkit.utils.preprocess_tool.number_transfer.seg_and_tag_mawps_single(st, nums_fraction, nums)[source]
mwptoolkit.utils.preprocess_tool.number_transfer.seg_and_tag_multi(st, nums_fraction, nums)[source]
mwptoolkit.utils.preprocess_tool.number_transfer.seg_and_tag_single(st, nums_fraction, nums)[source]
mwptoolkit.utils.preprocess_tool.number_transfer.seg_and_tag_svamp(st, nums_fraction, nums)[source]