| | --- |
| | license: apache-2.0 |
| | --- |
| | # Implementation of ACL 2024 findings "Improving Grammatical Error Correction via Contextual Data Augmentation" |
| |
|
| | [github link](https://github.com/wyxstriker/CDA4GEC) |
| |
|
| | # Model Weights |
| | We release the model weights of each training stage. |
| | Our model is trained based on the Fairseq framework, details of the weights and links to them are below. |
| |
|
| | |Name|Data Info|Download Link| |
| | |:--:|--|--| |
| | |Stage1|Pre-training on [C4 synthetic data](https://github.com/google-research-datasets/C4_200M-synthetic-dataset-for-grammatical-error-correction) with 200M scale|[CDA4GEC](https://huggingface.co/DecoderImmortal/CDA4GEC)/tree/main/stage1_checkpoint_best.pt| |
| | |Stage2+|Fine-tuning on the augmented Lang8, NUCLE, FCE and W&I+L datasets|[CDA4GEC](https://huggingface.co/DecoderImmortal/CDA4GEC)/tree/main/stage2_checkpoint_best.pt| |
| | |Stage3+|Continue fine-tuning on the augmented W&I+L dataset|[CDA4GEC](https://huggingface.co/DecoderImmortal/CDA4GEC)/tree/main/stage3_checkpoint_best.pt| |
| |
|
| | # Synthetic Data |
| | > We only release the synthetic pseudo-data, please follow the official process to apply for the original annotated data. |
| |
|
| |
|
| | |DataInfo|Amount|Source|Path| |
| | |:--:|:--:|:--:|:--:| |
| | |stage2+|2M|Lang-8 & NUCLE & FCE & W&I+L|[CDA4GEC](https://huggingface.co/DecoderImmortal/CDA4GEC)/tree/main/pseudo/stage2| |
| | |stage3+|200K|W&I+L|[CDA4GEC](https://huggingface.co/DecoderImmortal/CDA4GEC)/tree/main/pseudo/stage3| |
| |
|
| | # Citation |
| | If you find this work is useful for your research, please cite our paper: |
| |
|
| | ``` |
| | @inproceedings{wang-etal-2024-improving-grammatical, |
| | title = "Improving Grammatical Error Correction via Contextual Data Augmentation", |
| | author = "Wang, Yixuan and |
| | Wang, Baoxin and |
| | Liu, Yijun and |
| | Zhu, Qingfu and |
| | Wu, Dayong and |
| | Che, Wanxiang", |
| | editor = "Ku, Lun-Wei and |
| | Martins, Andre and |
| | Srikumar, Vivek", |
| | booktitle = "Findings of the Association for Computational Linguistics ACL 2024", |
| | month = aug, |
| | year = "2024", |
| | address = "Bangkok, Thailand and virtual meeting", |
| | publisher = "Association for Computational Linguistics", |
| | url = "https://aclanthology.org/2024.findings-acl.647", |
| | pages = "10898--10910", |
| | } |
| | ``` |
| |
|