CySent-SmolLM3-3B

CySent-SmolLM3-3B is a fine-tuned version of HuggingFaceTB/SmolLM3-3B, specifically adapted for cybersecurity instruction-following tasks. It was trained on a 20,000-sample subset of the Trendyol/Trendyol-Cybersecurity-Instruction-Tuning-Dataset. This model aims to act as a knowledgeable assistant for a wide range of cybersecurity topics. It achieves the following results on the evaluation set:

  • Loss: 0.757
  • Mean Token Accuracy: 0.796

Intended uses

This model is designed to assist with a variety of natural language cybersecurity tasks, including:

  • Answering technical questions about security concepts.
  • Explaining vulnerabilities, attack vectors, and defense mechanisms.
  • Generating simple security-related scripts or commands (e.g., for network analysis or pentesting).
  • Summarizing security logs, reports, or articles.
  • Assisting in educational settings for cybersecurity students and professionals.

It is intended as a co-pilot or assistant and not as a standalone, automated security tool.

Limitations

  • Not for Real-Time Threat Detection: This model is not designed for or capable of real-time intrusion detection or automated threat response.
  • Potential for Hallucination: Like all language models, it may generate incorrect, outdated, or completely fabricated information. Always verify critical information from authoritative sources.
  • Inherited Biases: The model may inherit biases and limitations from its base model (SmolLM3-3B) and the fine-tuning dataset.
  • Knowledge Cutoff: The model's knowledge is limited to the data it was trained on and may not be aware of the very latest vulnerabilities or security trends.
  • Misuse Potential: The model could potentially be used to generate malicious code or instructions for harmful purposes. Please use it responsibly and ethically.

How to use

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "RamzyBakir/CySent-SmolLM3-3B"

# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Create a prompt
prompt = "### Instruction:\nExplain what a SQL injection attack is and provide a simple example of a vulnerable code snippet.\n\n### Response:\n"

# Generate a response
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
output = model.generate(**inputs, max_new_tokens=250, do_sample=True, temperature=0.7, top_p=0.9)

# Decode and print the result
response = tokenizer.decode(output[0], skip_special_tokens=True)
print(response)

Training procedure

Training hyperparameters

The model was fine-tuned using Low-Rank Adaptation (LoRA) with the following configuration:

SFTConfig:

  • max_length: 2048
  • per_device_train_batch_size: 8
  • gradient_accumulation_steps: 2
  • learning_rate: 1e-4
  • num_train_epochs: 3
  • warmup_ratio: 0.1
  • weight_decay: 0.01
  • optim: adamw_torch
  • bf16: True
  • eval_strategy: steps
  • eval_steps: 200
  • save_steps: 200
  • metric_for_best_model: eval_loss

LoraConfig:

  • r: 16
  • lora_alpha: 32
  • lora_dropout: 0.05
  • task_type: CAUSAL_LM
  • target_modules: ["q_proj", "k_proj", "v_proj", "o_proj"]

Training results

The model was trained for 3200 steps on a single H200 GPU. The training and validation metrics progressed as follows:

Step Training Loss Validation Loss Entropy Num Tokens Mean Token Accuracy
200 1.111500 1.045437 1.002200 2,182,437.00 0.740981
400 0.975900 0.944684 0.917857 4,368,626.00 0.759094
800 0.863500 0.860705 0.862549 8,721,104.00 0.775031
1200 0.834900 0.816342 0.849365 13,096,717.00 0.784405
1600 0.792200 0.794083 0.802182 17,452,772.00 0.788403
2000 0.777900 0.779576 0.790627 21,807,624.00 0.791107
2400 0.749800 0.771720 0.761689 26,151,814.00 0.792799
2800 0.747800 0.762957 0.761588 30,504,962.00 0.794528
3200 0.735800 0.757395 0.757575 34,860,059.00 0.795802

The model achieved its best performance at the final step, with a validation loss of 0.757 and a mean token accuracy of 0.796.


Downloads last month
10
Safetensors
Model size
3B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RamzyBakir/CySent-SmolLM3-3B

Adapter
(17)
this model
Adapters
2 models

Dataset used to train RamzyBakir/CySent-SmolLM3-3B

Collection including RamzyBakir/CySent-SmolLM3-3B