visolex
/

bartpho-spam-binary

@@ -1,58 +1,130 @@
 ---
-language: vi
 tags:
-- spam-detection
 - vietnamese
-- bartpho
-license: apache-2.0
 datasets:
-- visolex/ViSpamReviews
 metrics:
 - accuracy
-- f1
 model-index:
 - name: bartpho-spam-binary
   results:
   - task:
       type: text-classification
-      name: Spam Detection (Binary)
     dataset:
       name: ViSpamReviews
-      type: custom
     metrics:
-    - name: Accuracy
-      type: accuracy
-      value: <INSERT_ACCURACY>
-    - name: F1 Score
-      type: f1
-      value: <INSERT_F1_SCORE>
-base_model:
-- vinai/bartpho-syllable
-pipeline_tag: text-classification
 ---
-# BARTPho-Spam-Binary
-Fine-tuned from [`vinai/bartpho-syllable`](https://huggingface.co/vinai/bartpho-syllable) on **ViSpamReviews** (binary).
-* **Task**: Binary classification
-* **Dataset**: [ViSpamReviews](https://huggingface.co/datasets/visolex/ViSpamReviews)
-* **Hyperparameters**
-  * Batch size: 32
-  * LR: 3e-5
-  * Epochs: 100
-  * Max seq len: 256
 ## Usage
 ```python
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
-tokenizer = AutoTokenizer.from_pretrained("visolex/bartpho-spam-binary")
-model = AutoModelForSequenceClassification.from_pretrained("visolex/bartpho-spam-binary")
-text = "Review này không có thật."
 inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
-pred = model(**inputs).logits.argmax(dim=-1).item()
-print("Spam" if pred==1 else "Non-spam")
 ```

 ---
+license: apache-2.0
+base_model: vinai/bartpho-syllable
 tags:
 - vietnamese
+- spam-detection
+- text-classification
+- e-commerce
 datasets:
+- ViSpamReviews
 metrics:
 - accuracy
+- macro-f1
+- macro-precision
+- macro-recall
 model-index:
 - name: bartpho-spam-binary
   results:
   - task:
       type: text-classification
+      name: Spam Review Detection
     dataset:
       name: ViSpamReviews
+      type: ViSpamReviews
     metrics:
+      - type: accuracy
+        value: N/A
+      - type: macro-f1
+        value: N/A
 ---
+# bartpho-spam-binary: Spam Review Detection for Vietnamese Text
+This model is a fine-tuned version of [vinai/bartpho-syllable](https://huggingface.co/vinai/bartpho-syllable) on the **ViSpamReviews** dataset for spam review detection in Vietnamese e-commerce reviews.
+## Model Details
+* **Base Model**: `vinai/bartpho-syllable`
+* **Description**: BART Pho - Vietnamese BART model
+* **Dataset**: ViSpamReviews (Vietnamese Spam Review Dataset)
+* **Fine-tuning Framework**: HuggingFace Transformers
+* **Task**: Spam Review Detection (binary)
+* **Number of Classes**: 2
+### Hyperparameters
+* Max sequence length: `256`
+* Learning rate: `5e-5`
+* Batch size: `32`
+* Epochs: `100`
+* Early stopping patience: `5`
+## Dataset
+The model was trained on the **ViSpamReviews** dataset, which contains 19,860 Vietnamese e-commerce review samples. The dataset includes:
+* **Train set**: 14,299 samples (72%)
+* **Validation set**: 1,590 samples (8%)
+* **Test set**: 3,971 samples (20%)
+### Label Distribution
+* **Non-spam** (0): Genuine product reviews
+* **Spam** (1): Fake or promotional reviews
+## Results
+The model was evaluated on the test set with the following metrics:
+* Results: <INSERT_METRICS>
 ## Usage
+You can use this model for spam review detection in Vietnamese text. Below is an example:
 ```python
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+# Load model and tokenizer
+model_name = "visolex/bartpho-spam-binary"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+# Example review text
+text = "Sản phẩm này rất tốt, shop giao hàng nhanh!"
+# Tokenize
 inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
+# Predict
+with torch.no_grad():
+    outputs = model(**inputs)
+    predicted_class = outputs.logits.argmax(dim=-1).item()
+    probabilities = torch.softmax(outputs.logits, dim=-1)
+# Map to label
+label_map = {0: "Non-spam", 1: "Spam"}
+predicted_label = label_map[predicted_class]
+confidence = probabilities[0][predicted_class].item()
+print(f"Text: {text}")
+print(f"Predicted: {predicted_label} (confidence: {confidence:.2%})")
 ```
+## Citation
+If you use this model, please cite:
+```bibtex
+@misc{{
+  {model_key}_spam_detection,
+  title={{{description}}},
+  author={{ViSoLex Team}},
+  year={{2025}},
+  howpublished={{\url{{https://huggingface.co/{visolex/bartpho-spam-binary}}}}}
+}}
+```
+## License
+This model is released under the Apache-2.0 license.
+## Acknowledgments
+* Base model: [{base_model}](https://huggingface.co/{base_model})
+* Dataset: ViSpamReviews (Vietnamese Spam Review Dataset)
+* ViSoLex Toolkit for Vietnamese NLP