Update README.md
Browse files
README.md
CHANGED
|
@@ -4,14 +4,12 @@ language:
|
|
| 4 |
arXiv: 2403.01308
|
| 5 |
library_name: transformers
|
| 6 |
pipeline_tag: text2text-generation
|
| 7 |
-
inference:
|
| 8 |
-
parameters:
|
| 9 |
-
max_new_tokens: 128
|
| 10 |
widget:
|
| 11 |
-
- text:
|
| 12 |
-
Ben buraya baz谋 <MASK> istiyorum.
|
| 13 |
example_title: Masked Language Modeling
|
| 14 |
license: cc-by-nc-sa-4.0
|
|
|
|
|
|
|
| 15 |
---
|
| 16 |
# VBART Model Card
|
| 17 |
|
|
@@ -29,23 +27,7 @@ This repository contains pre-trained TensorFlow and Safetensors weights of VBART
|
|
| 29 |
- **License:** CC BY-NC-SA 4.0
|
| 30 |
- **Finetuned from:** VBART-Large
|
| 31 |
- **Paper:** [arXiv](https://arxiv.org/abs/2403.01308)
|
| 32 |
-
## How to Get Started with the Model
|
| 33 |
-
```python
|
| 34 |
-
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
| 35 |
|
| 36 |
-
tokenizer = AutoTokenizer.from_pretrained("vngrs-ai/VBART-Medium-Base",
|
| 37 |
-
model_input_names=['input_ids', 'attention_mask'])
|
| 38 |
-
# Uncomment the device_map kwarg and delete the closing bracket to use model for inference on GPU
|
| 39 |
-
model = AutoModelForSeq2SeqLM.from_pretrained("vngrs-ai/VBART-Medium-Base")#, device_map="auto")
|
| 40 |
-
|
| 41 |
-
# Input text
|
| 42 |
-
input_text = "Ben buraya baz谋 <MASK> istiyorum."
|
| 43 |
-
|
| 44 |
-
token_input = tokenizer(input_text, return_tensors="pt")#.to('cuda')
|
| 45 |
-
outputs = model.generate(**token_input)
|
| 46 |
-
print(tokenizer.decode(outputs[0]))
|
| 47 |
-
```
|
| 48 |
-
|
| 49 |
## Training Details
|
| 50 |
|
| 51 |
### Training Data
|
|
@@ -64,7 +46,7 @@ Pre-trained for a total of 63B tokens.
|
|
| 64 |
#### Hyperparameters
|
| 65 |
##### Pretraining
|
| 66 |
- **Training regime:** fp16 mixed precision
|
| 67 |
-
- **Training objective**:
|
| 68 |
- **Optimizer** : Adam optimizer (尾1 = 0.9, 尾2 = 0.98, 茞 = 1e-6)
|
| 69 |
- **Scheduler**: Custom scheduler from the original Transformers paper (20,000 warm-up steps)
|
| 70 |
- **Dropout**: 0.1
|
|
|
|
| 4 |
arXiv: 2403.01308
|
| 5 |
library_name: transformers
|
| 6 |
pipeline_tag: text2text-generation
|
|
|
|
|
|
|
|
|
|
| 7 |
widget:
|
| 8 |
+
- text: Ben buraya baz谋 <MASK> istiyorum.
|
|
|
|
| 9 |
example_title: Masked Language Modeling
|
| 10 |
license: cc-by-nc-sa-4.0
|
| 11 |
+
datasets:
|
| 12 |
+
- vngrs-ai/vngrs-web-corpus
|
| 13 |
---
|
| 14 |
# VBART Model Card
|
| 15 |
|
|
|
|
| 27 |
- **License:** CC BY-NC-SA 4.0
|
| 28 |
- **Finetuned from:** VBART-Large
|
| 29 |
- **Paper:** [arXiv](https://arxiv.org/abs/2403.01308)
|
|
|
|
|
|
|
|
|
|
| 30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
## Training Details
|
| 32 |
|
| 33 |
### Training Data
|
|
|
|
| 46 |
#### Hyperparameters
|
| 47 |
##### Pretraining
|
| 48 |
- **Training regime:** fp16 mixed precision
|
| 49 |
+
- **Training objective**: Span masking (using mask lengths sampled from Poisson distribution 位=3.5, masking 30% of tokens)
|
| 50 |
- **Optimizer** : Adam optimizer (尾1 = 0.9, 尾2 = 0.98, 茞 = 1e-6)
|
| 51 |
- **Scheduler**: Custom scheduler from the original Transformers paper (20,000 warm-up steps)
|
| 52 |
- **Dropout**: 0.1
|