Update README.md
Browse files
README.md
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
---
|
| 2 |
-
new_version: aisingapore/
|
| 3 |
license: mit
|
| 4 |
language:
|
| 5 |
- en
|
|
@@ -14,7 +14,7 @@ language:
|
|
| 14 |
- km
|
| 15 |
- lo
|
| 16 |
---
|
| 17 |
-
# SEA-LION
|
| 18 |
|
| 19 |
SEA-LION is a collection of Large Language Models (LLMs) which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
|
| 20 |
The size of the models range from 3 billion to 7 billion parameters.
|
|
@@ -30,11 +30,11 @@ SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
|
|
| 30 |
The SEA-LION model is a significant leap forward in the field of Natural Language Processing,
|
| 31 |
specifically trained to understand the SEA regional context.
|
| 32 |
|
| 33 |
-
SEA-LION is built on the robust MPT architecture and has a vocabulary size of 256K.
|
| 34 |
|
| 35 |
For tokenization, the model employs our custom SEABPETokenizer, which is specially tailored for SEA languages, ensuring optimal model performance.
|
| 36 |
|
| 37 |
-
The training data for SEA-LION encompasses 980B tokens.
|
| 38 |
|
| 39 |
- **Developed by:** Products Pillar, AI Singapore
|
| 40 |
- **Funded by:** Singapore NRF
|
|
@@ -44,7 +44,7 @@ The training data for SEA-LION encompasses 980B tokens.
|
|
| 44 |
|
| 45 |
### Performance Benchmarks
|
| 46 |
|
| 47 |
-
SEA-LION has an average performance on general tasks in English (as measured by Hugging Face's LLM Leaderboard):
|
| 48 |
|
| 49 |
| Model | ARC | HellaSwag | MMLU | TruthfulQA | Average |
|
| 50 |
|-------------|:-----:|:---------:|:-----:|:----------:|:-------:|
|
|
@@ -54,7 +54,7 @@ SEA-LION has an average performance on general tasks in English (as measured by
|
|
| 54 |
|
| 55 |
### Data
|
| 56 |
|
| 57 |
-
SEA-LION was trained on 980B tokens of the following data:
|
| 58 |
|
| 59 |
| Data Source | Unique Tokens | Multiplier | Total Tokens | Percentage |
|
| 60 |
|---------------------------|:-------------:|:----------:|:------------:|:----------:|
|
|
@@ -80,10 +80,10 @@ SEA-LION was trained on 980B tokens of the following data:
|
|
| 80 |
|
| 81 |
### Infrastructure
|
| 82 |
|
| 83 |
-
SEA-LION was trained using [MosaicML Composer](https://github.com/mosaicml/composer)
|
| 84 |
on the following hardware:
|
| 85 |
|
| 86 |
-
| Training Details | SEA-LION
|
| 87 |
|----------------------|:------------:|
|
| 88 |
| AWS EC2 p4d.24xlarge | 32 instances |
|
| 89 |
| Nvidia A100 40GB GPU | 256 |
|
|
@@ -92,7 +92,7 @@ on the following hardware:
|
|
| 92 |
|
| 93 |
### Configuration
|
| 94 |
|
| 95 |
-
| HyperParameter | SEA-LION
|
| 96 |
|-------------------|:------------------:|
|
| 97 |
| Precision | bfloat16 |
|
| 98 |
| Optimizer | decoupled_adamw |
|
|
@@ -106,9 +106,9 @@ on the following hardware:
|
|
| 106 |
|
| 107 |
### Model Architecture and Objective
|
| 108 |
|
| 109 |
-
SEA-LION is a decoder model using the MPT architecture.
|
| 110 |
|
| 111 |
-
| Parameter | SEA-LION
|
| 112 |
|-----------------|:-----------:|
|
| 113 |
| Layers | 32 |
|
| 114 |
| d_model | 4096 |
|
|
|
|
| 1 |
---
|
| 2 |
+
new_version: aisingapore/Gemma-SEA-LION-v3-9B
|
| 3 |
license: mit
|
| 4 |
language:
|
| 5 |
- en
|
|
|
|
| 14 |
- km
|
| 15 |
- lo
|
| 16 |
---
|
| 17 |
+
# SEA-LION-v1-7B
|
| 18 |
|
| 19 |
SEA-LION is a collection of Large Language Models (LLMs) which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
|
| 20 |
The size of the models range from 3 billion to 7 billion parameters.
|
|
|
|
| 30 |
The SEA-LION model is a significant leap forward in the field of Natural Language Processing,
|
| 31 |
specifically trained to understand the SEA regional context.
|
| 32 |
|
| 33 |
+
SEA-LION-v1-7B is built on the robust MPT architecture and has a vocabulary size of 256K.
|
| 34 |
|
| 35 |
For tokenization, the model employs our custom SEABPETokenizer, which is specially tailored for SEA languages, ensuring optimal model performance.
|
| 36 |
|
| 37 |
+
The training data for SEA-LION-v1-7B encompasses 980B tokens.
|
| 38 |
|
| 39 |
- **Developed by:** Products Pillar, AI Singapore
|
| 40 |
- **Funded by:** Singapore NRF
|
|
|
|
| 44 |
|
| 45 |
### Performance Benchmarks
|
| 46 |
|
| 47 |
+
SEA-LION-v1-7B has an average performance on general tasks in English (as measured by Hugging Face's LLM Leaderboard):
|
| 48 |
|
| 49 |
| Model | ARC | HellaSwag | MMLU | TruthfulQA | Average |
|
| 50 |
|-------------|:-----:|:---------:|:-----:|:----------:|:-------:|
|
|
|
|
| 54 |
|
| 55 |
### Data
|
| 56 |
|
| 57 |
+
SEA-LION-v1-7B was trained on 980B tokens of the following data:
|
| 58 |
|
| 59 |
| Data Source | Unique Tokens | Multiplier | Total Tokens | Percentage |
|
| 60 |
|---------------------------|:-------------:|:----------:|:------------:|:----------:|
|
|
|
|
| 80 |
|
| 81 |
### Infrastructure
|
| 82 |
|
| 83 |
+
SEA-LION-v1-7B was trained using [MosaicML Composer](https://github.com/mosaicml/composer)
|
| 84 |
on the following hardware:
|
| 85 |
|
| 86 |
+
| Training Details | SEA-LION-v1-7B |
|
| 87 |
|----------------------|:------------:|
|
| 88 |
| AWS EC2 p4d.24xlarge | 32 instances |
|
| 89 |
| Nvidia A100 40GB GPU | 256 |
|
|
|
|
| 92 |
|
| 93 |
### Configuration
|
| 94 |
|
| 95 |
+
| HyperParameter | SEA-LION-v1-7B |
|
| 96 |
|-------------------|:------------------:|
|
| 97 |
| Precision | bfloat16 |
|
| 98 |
| Optimizer | decoupled_adamw |
|
|
|
|
| 106 |
|
| 107 |
### Model Architecture and Objective
|
| 108 |
|
| 109 |
+
SEA-LION-v1-7B is a decoder model using the MPT architecture.
|
| 110 |
|
| 111 |
+
| Parameter | SEA-LION-v1-7B |
|
| 112 |
|-----------------|:-----------:|
|
| 113 |
| Layers | 32 |
|
| 114 |
| d_model | 4096 |
|