Publish PDT adapters + arXiv model card

Browse files

Files changed (8) hide show

.gitattributes +2 -33
README.md +61 -3
SHA256SUMS +6 -0
agreement_thresholds.json +176 -0
pdt_adapters.safetensors +3 -0
train_manifest.json +19 -0
train_run_stages.json +44 -0
training_report.json +111 -0

.gitattributes CHANGED Viewed

@@ -1,35 +1,4 @@
-*.7z filter=lfs diff=lfs merge=lfs -text
-*.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
-*.bz2 filter=lfs diff=lfs merge=lfs -text
-*.ckpt filter=lfs diff=lfs merge=lfs -text
-*.ftz filter=lfs diff=lfs merge=lfs -text
-*.gz filter=lfs diff=lfs merge=lfs -text
-*.h5 filter=lfs diff=lfs merge=lfs -text
-*.joblib filter=lfs diff=lfs merge=lfs -text
-*.lfs.* filter=lfs diff=lfs merge=lfs -text
-*.mlmodel filter=lfs diff=lfs merge=lfs -text
-*.model filter=lfs diff=lfs merge=lfs -text
-*.msgpack filter=lfs diff=lfs merge=lfs -text
-*.npy filter=lfs diff=lfs merge=lfs -text
-*.npz filter=lfs diff=lfs merge=lfs -text
-*.onnx filter=lfs diff=lfs merge=lfs -text
-*.ot filter=lfs diff=lfs merge=lfs -text
-*.parquet filter=lfs diff=lfs merge=lfs -text
-*.pb filter=lfs diff=lfs merge=lfs -text
-*.pickle filter=lfs diff=lfs merge=lfs -text
-*.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
-*.pth filter=lfs diff=lfs merge=lfs -text
-*.rar filter=lfs diff=lfs merge=lfs -text
-*.safetensors filter=lfs diff=lfs merge=lfs -text
-saved_model/**/* filter=lfs diff=lfs merge=lfs -text
-*.tar.* filter=lfs diff=lfs merge=lfs -text
-*.tar filter=lfs diff=lfs merge=lfs -text
-*.tflite filter=lfs diff=lfs merge=lfs -text
-*.tgz filter=lfs diff=lfs merge=lfs -text
-*.wasm filter=lfs diff=lfs merge=lfs -text
-*.xz filter=lfs diff=lfs merge=lfs -text
-*.zip filter=lfs diff=lfs merge=lfs -text
-*.zst filter=lfs diff=lfs merge=lfs -text
-*tfevents* filter=lfs diff=lfs merge=lfs -text

+*.safetensors filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
+*.jsonl filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,61 @@
----
-license: mit
----

+---
+language:
+  - en
+license: mit
+tags:
+  - parallel-decoding
+  - speculative-decoding
+  - transformers
+  - research
+  - arxiv
+base_model: openai/gpt-oss-20b
+library_name: transformers
+pipeline_tag: text-generation
+paper:
+  title: "Parallel Decoder Transformer: Model-Internal Parallel Decoding with Speculative Invariance via Note Conditioning"
+  url: https://arxiv.org/abs/2512.10054
+---
+# Parallel Decoder Transformer (PDT) adapters for GPT-OSS-20B
+This repository contains **PDT adapter/head weights** trained against the GPT-OSS-20B trunk, plus minimal training artifacts.
+**Paper:** [Parallel Decoder Transformer: Model-Internal Parallel Decoding with Speculative Invariance via Note Conditioning](https://arxiv.org/abs/2512.10054)
+## Abstract (arXiv)
+Autoregressive decoding in Large Language Models (LLMs) is inherently sequential, creating a latency bottleneck that scales linearly with output length. While "Decomposition-and-Fill" methods like Skeleton-of-Thought attempt to parallelize generation via external orchestration, they suffer from coherence drift due to the lack of cross-stream communication. In this work, we introduce the Parallel Decoder Transformer (PDT), a parameter-efficient architecture that embeds coordination primitives directly into the inference process of a frozen pre-trained model. Instead of retraining the base model, PDT injects lightweight Speculative Note Conditioning (SNC) adapters that allow parallel decoding streams to synchronize via a shared, dynamic latent space. We formulate coordination as a speculative consensus problem, where sibling streams broadcast semantic "notes" to a global bus, gated by a learned verification head. We validate our approach on a 50,000-step curriculum using a frozen 20B-parameter backbone. Our results demonstrate that PDT achieves effective self-correction, reaching 77.8% precision in coverage prediction and recovering approximate serial semantics without modifying the trunk weights. This establishes PDT as a scalable, efficient alternative to full model fine-tuning for structured parallel generation.
+## How to use
+1. Install the reference implementation (runtime + scripts):
+   - `https://github.com/ljrweb-self/parallel-decoder-transformer`
+2. Download the base trunk model (`openai/gpt-oss-20b`) via Hugging Face (or provide a local path).
+3. Download the adapter checkpoint from this repo and point `configs/gpt_oss_transfer_production.yaml` (or CLI flags) at it.
+## Citation
+```bibtex
+@misc{robbins2025pdt,
+  title={Parallel Decoder Transformer: Model-Internal Parallel Decoding with Speculative Invariance via Note Conditioning},
+  author={Robbins, Logan},
+  year={2025},
+  eprint={2512.10054},
+  archivePrefix={arXiv},
+  primaryClass={cs.AI},
+  url={https://arxiv.org/abs/2512.10054}
+}
+```
+## What’s included
+- `pdt_adapters.*`: trainable adapter/head weights (no trunk weights unless you intentionally uploaded them)
+- `training_report.json`, `train_run_stages.json`, `train_manifest.json`, `agreement_thresholds.json`
+## License
+- **This repo (adapters + artifacts)**: MIT.
+- **Base model**: `openai/gpt-oss-20b` is licensed under Apache-2.0 on Hugging Face (also see its `USAGE_POLICY` there).
+- **Reference implementation**: MIT at `https://github.com/ljrweb-self/parallel-decoder-transformer`.

SHA256SUMS ADDED Viewed

	@@ -0,0 +1,6 @@

+ffb380b0e6eff7c258dd0d86e95fb1f1c55d5c3d7d359ef8dd611869c2ec697a  README.md
+1480ecbcfb0f893fb503d596457cbca4ba0a953b428f1b9e3c77aea36b509655  agreement_thresholds.json
+ffffed033e7c8abdca80e258cb261e70b784bc1efc3b3fe3cdd49bc44c9ccb75  pdt_adapters.safetensors
+ac5a69de8b61c55db51bd693544f4dec7598b86c4a50b7f2a0e4ff4dc9ce1366  train_manifest.json
+c5e3ad48156d856426d85ebf30697f2e6e5da33d15f11b378dc28ee042446d40  train_run_stages.json
+5220c01bd48ceaf072a7ba0262964ecb6c61d8f589ffa28e4b0442c3eccd02f1  training_report.json

agreement_thresholds.json ADDED Viewed

	@@ -0,0 +1,176 @@

+{
+  "agreement_threshold": 0.15,
+  "roc_points": [
+    {
+      "threshold": 0.05,
+      "precision": 0.0,
+      "recall": 0.0,
+      "tp": 0.0,
+      "fp": 0.0,
+      "fn": 0.0,
+      "tn": 0.0
+    },
+    {
+      "threshold": 0.1,
+      "precision": 0.0,
+      "recall": 0.0,
+      "tp": 0.0,
+      "fp": 0.0,
+      "fn": 0.0,
+      "tn": 0.0
+    },
+    {
+      "threshold": 0.15,
+      "precision": 0.0,
+      "recall": 0.0,
+      "tp": 0.0,
+      "fp": 0.0,
+      "fn": 0.0,
+      "tn": 0.0
+    },
+    {
+      "threshold": 0.2,
+      "precision": 0.0,
+      "recall": 0.0,
+      "tp": 0.0,
+      "fp": 0.0,
+      "fn": 0.0,
+      "tn": 0.0
+    },
+    {
+      "threshold": 0.25,
+      "precision": 0.0,
+      "recall": 0.0,
+      "tp": 0.0,
+      "fp": 0.0,
+      "fn": 0.0,
+      "tn": 0.0
+    },
+    {
+      "threshold": 0.3,
+      "precision": 0.0,
+      "recall": 0.0,
+      "tp": 0.0,
+      "fp": 0.0,
+      "fn": 0.0,
+      "tn": 0.0
+    },
+    {
+      "threshold": 0.35,
+      "precision": 0.0,
+      "recall": 0.0,
+      "tp": 0.0,
+      "fp": 0.0,
+      "fn": 0.0,
+      "tn": 0.0
+    },
+    {
+      "threshold": 0.4,
+      "precision": 0.0,
+      "recall": 0.0,
+      "tp": 0.0,
+      "fp": 0.0,
+      "fn": 0.0,
+      "tn": 0.0
+    },
+    {
+      "threshold": 0.45,
+      "precision": 0.0,
+      "recall": 0.0,
+      "tp": 0.0,
+      "fp": 0.0,
+      "fn": 0.0,
+      "tn": 0.0
+    },
+    {
+      "threshold": 0.5,
+      "precision": 0.0,
+      "recall": 0.0,
+      "tp": 0.0,
+      "fp": 0.0,
+      "fn": 0.0,
+      "tn": 0.0
+    },
+    {
+      "threshold": 0.55,
+      "precision": 0.0,
+      "recall": 0.0,
+      "tp": 0.0,
+      "fp": 0.0,
+      "fn": 0.0,
+      "tn": 0.0
+    },
+    {
+      "threshold": 0.6,
+      "precision": 0.0,
+      "recall": 0.0,
+      "tp": 0.0,
+      "fp": 0.0,
+      "fn": 0.0,
+      "tn": 0.0
+    },
+    {
+      "threshold": 0.65,
+      "precision": 0.0,
+      "recall": 0.0,
+      "tp": 0.0,
+      "fp": 0.0,
+      "fn": 0.0,
+      "tn": 0.0
+    },
+    {
+      "threshold": 0.7,
+      "precision": 0.0,
+      "recall": 0.0,
+      "tp": 0.0,
+      "fp": 0.0,
+      "fn": 0.0,
+      "tn": 0.0
+    },
+    {
+      "threshold": 0.75,
+      "precision": 0.0,
+      "recall": 0.0,
+      "tp": 0.0,
+      "fp": 0.0,
+      "fn": 0.0,
+      "tn": 0.0
+    },
+    {
+      "threshold": 0.8,
+      "precision": 0.0,
+      "recall": 0.0,
+      "tp": 0.0,
+      "fp": 0.0,
+      "fn": 0.0,
+      "tn": 0.0
+    },
+    {
+      "threshold": 0.85,
+      "precision": 0.0,
+      "recall": 0.0,
+      "tp": 0.0,
+      "fp": 0.0,
+      "fn": 0.0,
+      "tn": 0.0
+    },
+    {
+      "threshold": 0.9,
+      "precision": 0.0,
+      "recall": 0.0,
+      "tp": 0.0,
+      "fp": 0.0,
+      "fn": 0.0,
+      "tn": 0.0
+    },
+    {
+      "threshold": 0.95,
+      "precision": 0.0,
+      "recall": 0.0,
+      "tp": 0.0,
+      "fp": 0.0,
+      "fn": 0.0,
+      "tn": 0.0
+    }
+  ]
+}

pdt_adapters.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ffffed033e7c8abdca80e258cb261e70b784bc1efc3b3fe3cdd49bc44c9ccb75
+size 1716905574

train_manifest.json ADDED Viewed

	@@ -0,0 +1,19 @@

+{
+  "agreement_threshold": 0.15,
+  "agreement_thresholds_file": "agreement_thresholds.json",
+  "best_eval_loss": 21.752999266554905,
+  "config_path": "/home/ubuntu/nstream-transformer/configs/gpt_oss_transfer_production.yaml",
+  "coverage_threshold": 0.4,
+  "dataset": "data/processed/pdt_10k_gpt41/kd_train.jsonl",
+  "eval_dataset": "data/processed/pdt_10k_gpt41/kd_validation.jsonl",
+  "git_dirty": true,
+  "git_sha": "d25d7dac8a57d6bed782e7251b657339341e33e0",
+  "global_step": 50000,
+  "notes_schema_version": "2.0",
+  "plan_hash_buckets": 65536,
+  "plan_hash_salt": "parallel-decoder-v1",
+  "plan_vocab_size": 65536,
+  "stages_file": "train_run_stages.json",
+  "wandb_run_name": "gpt-oss-8xH100-50000steps",
+  "wandb_run_url": "https://wandb.ai/ljrweb-self/parallel-decoder-transformer/runs/fmuea63a"
+}

train_run_stages.json ADDED Viewed

	@@ -0,0 +1,44 @@

+[
+  {
+    "stage_index": 2,
+    "stage_name": "notes_bus_enable_extended",
+    "start_step": 22500,
+    "timestamp": "2025-12-07T11:50:45.750997+00:00",
+    "actions": {
+      "bus_mix_prob": 0.75,
+      "freeze": [
+        "trunk",
+        "agreement_head",
+        "coverage_head"
+      ],
+      "unfreeze": [
+        "speculation_head"
+      ]
+    },
+    "end_step": 25000,
+    "steps": 2500,
+    "duration": 951.741001367569,
+    "completed_at": "2025-12-07T12:06:37.492038+00:00"
+  },
+  {
+    "stage_index": 3,
+    "stage_name": "rollback_training_extended",
+    "start_step": 25000,
+    "timestamp": "2025-12-07T12:06:37.492038+00:00",
+    "actions": {
+      "bus_mix_prob": 0.35,
+      "stream_dropout_prob": 0.15,
+      "freeze": [
+        "trunk"
+      ],
+      "unfreeze": [
+        "agreement_head",
+        "coverage_head"
+      ]
+    },
+    "end_step": 50000,
+    "steps": 25000,
+    "duration": 10849.983159542084,
+    "completed_at": "2025-12-07T15:07:27.475214+00:00"
+  }
+]

training_report.json ADDED Viewed

	@@ -0,0 +1,111 @@

+{
+  "agreement_threshold": 0.15,
+  "best_eval_loss": 21.752999266554905,
+  "eval_history_length": 4,
+  "eval_metrics": {
+    "avg_margin_violation": null,
+    "contradiction_rate": null,
+    "coverage_f1": 0.09190465965213657,
+    "coverage_precision": 0.7157190635451505,
+    "coverage_recall": 0.04910509407985315,
+    "coverage_source": "logits",
+    "coverage_support": 117723.0,
+    "eval_loss": 70.75614915129924,
+    "nli_pair_count": 0,
+    "redundancy_index": 0.00012577735405615995,
+    "redundancy_pair_count": 85866
+  },
+  "generated_at": "2025-12-07T15:14:33.566756+00:00",
+  "global_step": 50000,
+  "stage": 3,
+  "train_history_length": 1100,
+  "train_metrics": {
+    "coverage_f1": {
+      "count": 1000,
+      "last": 0.0,
+      "max": 0.5,
+      "mean": 0.0034451770451770455,
+      "min": 0.0
+    },
+    "coverage_precision": {
+      "count": 1000,
+      "last": 0.0,
+      "max": 1.0,
+      "mean": 0.0029482174688057043,
+      "min": 0.0
+    },
+    "coverage_recall": {
+      "count": 1000,
+      "last": 0.0,
+      "max": 1.0,
+      "mean": 0.010866666666666667,
+      "min": 0.0
+    },
+    "kd_ce_ratio": {
+      "count": 1100,
+      "last": 6.771889053472952e-07,
+      "max": 1.993265237894144e-06,
+      "mean": 2.7301640674022355e-07,
+      "min": -2.044742390260858e-06
+    },
+    "loss": {
+      "count": 1100,
+      "last": 432.0,
+      "max": 604.0336303710938,
+      "mean": 405.4652520197088,
+      "min": 216.2018280029297
+    },
+    "repair_error_rate": {
+      "count": 3,
+      "last": 0.0,
+      "max": 0.0,
+      "mean": 0.0,
+      "min": 0.0
+    },
+    "repair_margin": {
+      "count": 3,
+      "last": 224.0,
+      "max": 294.0,
+      "mean": 261.3333333333333,
+      "min": 224.0
+    },
+    "rollback_kl": {
+      "count": 3,
+      "last": 0.0,
+      "max": 0.0,
+      "mean": 0.0,
+      "min": 0.0
+    },
+    "stability_error_rate": {
+      "count": 3,
+      "last": 0.0,
+      "max": 0.0,
+      "mean": 0.0,
+      "min": 0.0
+    },
+    "stability_kl": {
+      "count": 3,
+      "last": 0.0,
+      "max": 2.837623469531536e-09,
+      "mean": -1.8531864043325186e-07,
+      "min": -5.587935447692871e-07
+    },
+    "stability_margin": {
+      "count": 3,
+      "last": 235.0,
+      "max": 258.0,
+      "mean": 246.33333333333334,
+      "min": 235.0
+    },
+    "stage": {
+      "last": 3.0
+    },
+    "usage_loss": {
+      "count": 1100,
+      "last": 0.0,
+      "max": 0.0,
+      "mean": 0.0,
+      "min": 0.0
+    }
+  }
+}