Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -7,4 +7,32 @@ sdk: static
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
+
# Neural Bioinformatics Research Group - ProkBERT Models
|
| 11 |
+
|
| 12 |
+
Welcome to the official Hugging Face organization for the Neural Bioinformatics Research Group. Our main goal is to provide genomic language models for microbiome applications.
|
| 13 |
+
|
| 14 |
+
## Models
|
| 15 |
+
|
| 16 |
+
We provide access to a collection of pretrained and fine-tuned models from the ProkBERT family. These models are built on the Local Context Aware (LCA) tokenization, specifically tailored for DNA sequences to balance context size and performance.
|
| 17 |
+
|
| 18 |
+
ProkBERT models are designed for microbiome-related tasks, such as prokaryote promoter identification or phage detection. Despite their compact size, they are powerful and efficient.
|
| 19 |
+
|
| 20 |
+
## Model Overview
|
| 21 |
+
|
| 22 |
+
| Model | Parameters | Tokenizer | Layers | Attention Heads | Max. Context Size | Training Data |
|
| 23 |
+
|---------------|------------|------------------|--------|-----------------|-------------------|---------------------|
|
| 24 |
+
| `mini` | 20.6M | 6-mer, shift=1 | 6 | 6 | 1027 nt | 206.65 billion |
|
| 25 |
+
| `mini-c` | 24.9M | 1-mer | 6 | 6 | 1022 nt | 206.65 billion |
|
| 26 |
+
| `mini-long` | 26.6M | 6-mer, shift=2 | 6 | 6 | 4096 nt | 206.65 billion |
|
| 27 |
+
|
| 28 |
+
_A comprehensive overview of model parameters across varied configurations._
|
| 29 |
+
|
| 30 |
+
## Resources
|
| 31 |
+
|
| 32 |
+
- [Read our paper](https://www.frontiersin.org/articles/10.3389/fmicb.2023.1331233/full)
|
| 33 |
+
- [Learn more about the model](https://github.com/nbrg-ppcu/prokbert)
|
| 34 |
+
- [Get started with code on GitHub](https://github.com/nbrg-ppcu/prokbert)
|
| 35 |
+
|
| 36 |
+
---
|
| 37 |
+
|
| 38 |
+
For more information or questions, please visit our [GitHub repository](https://github.com/nbrg-ppcu/prokbert) or contact us at [email](obalasz@gmail.com).
|