mahwizzzz
/

Urdu-Bert

Model card Files Files and versions

Urdu-BERT Pretraining (PyTorch)

I implemented BERT model from scratch with Pytorch on Urdu Data. It uses Masked Language Modeling (MLM) and Next Sentence Prediction (NSP) tasks, just like the original BERT model.

✅ Features

Trained on Urdu-1M-news-text
Custom WordPiece tokenizer
Multi-head attention & transformer encoder blocks
NSP and MLM heads
Uses PyTorch and HuggingFace Tokenizers
Training tracked with Weights & Biases (WandB)

⚙️ Training Setup

Setting	Value
Epochs	20
Batch Size	64
Sequence Length	64 tokens
Embedding Size	128
Encoder Layers	2
Attention Heads	2
Max LR	2.5e-5
Warmup Steps	1000
Optimizer	Adam

📊 Training Loss (WandB)

Below is an example of the loss curve during training:

This shows MLM loss, NSP loss, and total loss reducing over time.

📌 Notes

NSP uses random sentence pairs for false samples
Positional embeddings are created using sin/cos
Custom learning rate scheduler with warm-up
Model is built completely from scratch (no pretrained weights)

📚 References

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support