| | --- |
| | tags: |
| | - model |
| | - checkpoints |
| | - translation |
| | - latin |
| | - english |
| | - mt5 |
| | - mistral |
| | - multilingual |
| | - NLP |
| | language: |
| | - en |
| | - la |
| | license: "cc-by-4.0" |
| | models: |
| | - mistralai/Mistral-7B-Instruct-v0.3 |
| | - google/mt5-small |
| | model_type: "mt5-small" |
| | training_epochs: 6 (initial pipeline), 30 (final pipeline with optimizations), 100 (fine-tuning on 4750 summaries) |
| | task_categories: |
| | - translation |
| | - summarization |
| | - multilingual-nlp |
| | task_ids: |
| | - en-la-translation |
| | - la-en-translation |
| | - text-generation |
| | pretty_name: "mT5-LatinSummarizerModel" |
| | storage: |
| | - git-lfs |
| | - huggingface-models |
| | size_categories: |
| | - 5GB<n<10GB |
| | --- |
| | # **mT5-LatinSummarizerModel: Fine-Tuned Model for Latin NLP** |
| |
|
| | [](https://github.com/AxelDlv00/LatinSummarizer) |
| | [](https://huggingface.co/LatinNLP/LatinSummarizerModel) |
| | [](https://huggingface.co/datasets/LatinNLP/LatinSummarizerDataset) |
| |
|
| | ## **Overview** |
| | This repository contains the **trained checkpoints and tokenizer files** for the `mT5-LatinSummarizerModel`, which was fine-tuned to improve **Latin summarization and translation**. It is designed to: |
| | - Translate between **English and Latin**. |
| | - Summarize Latin texts effectively. |
| | - Leverage extractive and abstractive summarization techniques. |
| | - Utilize **curriculum learning** for improved training. |
| |
|
| | ## **Installation & Usage** |
| | To download and set up the models (mT5-small and Mistral-7B-Instruct), you can directly run: |
| | ```bash |
| | bash install_large_models.sh |
| | ``` |
| |
|
| | ## **Project Structure** |
| | ``` |
| | . |
| | βββ final_pipeline (Trained for 30 light epochs with optimizations, and then finetuned on 100 on the small HQ summaries dataset) |
| | β βββ no_stanza |
| | β βββ with_stanza |
| | βββ initial_pipeline (Trained for 6 epochs without optimizations) |
| | β βββ mt5-small-en-la-translation-epoch5 |
| | βββ install_large_models.sh |
| | βββ README.md |
| | ``` |
| |
|
| | ## **Training Methodology** |
| | We fine-tuned **mT5-small** in three phases: |
| | 1. **Initial Training Pipeline (6 epochs)**: Used the full dataset without optimizations. |
| | 2. **Final Training Pipeline (30 light epochs)**: Used **10% of training data per epoch** for efficiency. |
| | 3. **Fine-Tuning (100 epochs)**: Focused on the **4750 high-quality summaries** for final optimization. |
| |
|
| | #### **Training Configurations:** |
| | - **Hardware:** 16GB VRAM GPU (lab machines via SSH). |
| | - **Batch Size:** Adaptive due to GPU memory constraints. |
| | - **Gradient Accumulation:** Enabled for larger effective batch sizes. |
| | - **LoRA-based fine-tuning:** LoRA Rank 8, Scaling Factor 32. |
| | - **Dynamic Sequence Length Adjustment:** Increased progressively. |
| | - **Learning Rate:** `5 Γ 10^-4` with warm-up steps. |
| | - **Checkpointing:** Frequent saves to mitigate power outages. |
| |
|
| | ## **Evaluation & Results** |
| | We evaluated the model using **ROUGE, BERTScore, and BLEU/chrF scores**. |
| |
|
| | | Metric | Before Fine-Tuning | After Fine-Tuning | |
| | |--------|-----------------|-----------------| |
| | | ROUGE-1 | 0.1675 | 0.2541 | |
| | | ROUGE-2 | 0.0427 | 0.0773 | |
| | | ROUGE-L | 0.1459 | 0.2139 | |
| | | BERTScore-F1 | 0.6573 | 0.7140 | |
| |
|
| | - **chrF Score (enβla):** 33.60 (with Stanza tags) vs 18.03 BLEU (without Stanza). |
| | - **Summarization Density:** Maintained at ~6%. |
| |
|
| | ### **Observations:** |
| | - Pre-training on **extractive summaries** was crucial. |
| | - The model retained some **excessive extraction**, indicating room for further improvement. |
| |
|
| | ## **License** |
| | This model is released under **CC-BY-4.0**. |
| |
|
| | ## **Citation** |
| | ```bibtex |
| | @misc{LatinSummarizerModel, |
| | author = {Axel Delaval, Elsa Lubek}, |
| | title = {Latin-English Summarization Model (mT5)}, |
| | year = {2025}, |
| | url = {https://huggingface.co/LatinNLP/LatinSummarizerModel} |
| | } |
| | ``` |