Update README.md
Browse files
README.md
CHANGED
|
@@ -4,9 +4,51 @@ language:
|
|
| 4 |
- zh
|
| 5 |
base_model:
|
| 6 |
- hfl/chinese-lert-base
|
|
|
|
|
|
|
| 7 |
---
|
| 8 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
-
|
|
|
|
| 11 |
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
- zh
|
| 5 |
base_model:
|
| 6 |
- hfl/chinese-lert-base
|
| 7 |
+
tags:
|
| 8 |
+
- punctuation-restoration
|
| 9 |
---
|
| 10 |
+
<div align="center">
|
| 11 |
+
<h1>FireRedChat-punc</h1>
|
| 12 |
+
</div>
|
| 13 |
+
<div align="center">
|
| 14 |
+
<a href="https://fireredteam.github.io/demos/firered_chat/">Demo</a> •
|
| 15 |
+
<a href="https://arxiv.org/pdf/2509.06502">FireRedChat Paper</a> •
|
| 16 |
+
<a href="https://huggingface.co/FireRedTeam">Huggingface</a>
|
| 17 |
+
</div>
|
| 18 |
|
| 19 |
+
## Descriptions
|
| 20 |
+
FireRedChat-punc is a fine-tuned `hfl/chinese-lert-base` model designed for punctuation restoration, primarily for post-processing in [FireRedASR](https://github.com/FireRedTeam/FireRedASR).
|
| 21 |
|
| 22 |
+
The model restores the following punctuation marks: [, 。 ? !]. It supports both Chinese and English text, enhancing the readability of transcribed text.
|
| 23 |
+
|
| 24 |
+
## Roadmap
|
| 25 |
+
- [x] 2025/09
|
| 26 |
+
- [x] Release the fine-tuned punctuation restoration model.
|
| 27 |
+
|
| 28 |
+
## Usage
|
| 29 |
+
|
| 30 |
+
RedPost source code [github](https://github.com/FireRedTeam/FireRedChat/tree/main/fireredasr-server/server/redpost)
|
| 31 |
+
Below is an example of how to use the FireRedChat-punc model for punctuation restoration:
|
| 32 |
+
|
| 33 |
+
```python
|
| 34 |
+
import os
|
| 35 |
+
from redpost import RedPost, RedPostConfig
|
| 36 |
+
|
| 37 |
+
punc_model_dir = os.path.join("FireRedChat-punc")
|
| 38 |
+
post_config = RedPostConfig(
|
| 39 |
+
use_gpu=True,
|
| 40 |
+
sentence_max_length=30
|
| 41 |
+
)
|
| 42 |
+
post_model = RedPost.from_pretrained(punc_model_dir, post_config)
|
| 43 |
+
batch_post_results = post_model.process([text], ["text"])
|
| 44 |
+
text = "".join([r["punc_text"] for r in batch_post_results])
|
| 45 |
+
text = re.sub("<unk>|<UNK>|\[unk\]|\[UNK\]", "", text)
|
| 46 |
+
print(text)
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
## License
|
| 50 |
+
The model and source code are licensed under the Apache-2.0 license.
|
| 51 |
+
|
| 52 |
+
### Acknowledgment
|
| 53 |
+
- Base model: `hfl/chinese-lert-base` (license: apache-2.0)
|
| 54 |
+
- Designed for integration with [FireRedASR](https://github.com/FireRedTeam/FireRedASR).
|