File size: 1,089 Bytes
4f08d2c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Dataset downloadable link.
Note that: the perturbed data is created with
the help these dataset and the code.

To download the dataset, please follow the link:

1) paws-wiki : https://github.com/google-research-datasets/paws
	Open the link and scroll to PAWS_Wiki section. Download
	PAWS-Wiki Labeled (Final)

2) QQP:  https://huggingface.co/datasets/glue/viewer/qqp/train
	Visit the huggingface link and download qqp paraphrasing 
	train dataset. 

3) MRPC: https://huggingface.co/datasets/glue/viewer/mrpc/train
	To download Microsoft Research Paraphrasing Corpus(MRPC) 
	dataset, visit the link and download the dataset (train 
	version).
Alternative: 
You can use the dataset provided in the zip file. Just unzip the data file
and use the data. 

Perturbed Data Generation:
We used the above dataset to create sentence perturbation for hypothesis 
testing. We took the first column (i.e. sentence1 or question1) as our original
sentence and produce a sentence perturbation for these sentences using 
WordNet toolkit. The code is provided in the zip file. 
check: scr/word_replacer.py