File size: 1,089 Bytes
4f08d2c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
Dataset downloadable link. Note that: the perturbed data is created with the help these dataset and the code. To download the dataset, please follow the link: 1) paws-wiki : https://github.com/google-research-datasets/paws Open the link and scroll to PAWS_Wiki section. Download PAWS-Wiki Labeled (Final) 2) QQP: https://huggingface.co/datasets/glue/viewer/qqp/train Visit the huggingface link and download qqp paraphrasing train dataset. 3) MRPC: https://huggingface.co/datasets/glue/viewer/mrpc/train To download Microsoft Research Paraphrasing Corpus(MRPC) dataset, visit the link and download the dataset (train version). Alternative: You can use the dataset provided in the zip file. Just unzip the data file and use the data. Perturbed Data Generation: We used the above dataset to create sentence perturbation for hypothesis testing. We took the first column (i.e. sentence1 or question1) as our original sentence and produce a sentence perturbation for these sentences using WordNet toolkit. The code is provided in the zip file. check: scr/word_replacer.py |