loganrobbins commited on
Commit
4600161
·
verified ·
1 Parent(s): 759992f

Publish PDT adapters + arXiv model card

Browse files
.gitattributes CHANGED
@@ -1,35 +1,4 @@
1
- *.7z filter=lfs diff=lfs merge=lfs -text
2
- *.arrow filter=lfs diff=lfs merge=lfs -text
3
  *.bin filter=lfs diff=lfs merge=lfs -text
4
- *.bz2 filter=lfs diff=lfs merge=lfs -text
5
- *.ckpt filter=lfs diff=lfs merge=lfs -text
6
- *.ftz filter=lfs diff=lfs merge=lfs -text
7
- *.gz filter=lfs diff=lfs merge=lfs -text
8
- *.h5 filter=lfs diff=lfs merge=lfs -text
9
- *.joblib filter=lfs diff=lfs merge=lfs -text
10
- *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
- *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
- *.model filter=lfs diff=lfs merge=lfs -text
13
- *.msgpack filter=lfs diff=lfs merge=lfs -text
14
- *.npy filter=lfs diff=lfs merge=lfs -text
15
- *.npz filter=lfs diff=lfs merge=lfs -text
16
- *.onnx filter=lfs diff=lfs merge=lfs -text
17
- *.ot filter=lfs diff=lfs merge=lfs -text
18
- *.parquet filter=lfs diff=lfs merge=lfs -text
19
- *.pb filter=lfs diff=lfs merge=lfs -text
20
- *.pickle filter=lfs diff=lfs merge=lfs -text
21
- *.pkl filter=lfs diff=lfs merge=lfs -text
22
  *.pt filter=lfs diff=lfs merge=lfs -text
23
- *.pth filter=lfs diff=lfs merge=lfs -text
24
- *.rar filter=lfs diff=lfs merge=lfs -text
25
- *.safetensors filter=lfs diff=lfs merge=lfs -text
26
- saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
- *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
- *.tflite filter=lfs diff=lfs merge=lfs -text
30
- *.tgz filter=lfs diff=lfs merge=lfs -text
31
- *.wasm filter=lfs diff=lfs merge=lfs -text
32
- *.xz filter=lfs diff=lfs merge=lfs -text
33
- *.zip filter=lfs diff=lfs merge=lfs -text
34
- *.zst filter=lfs diff=lfs merge=lfs -text
35
- *tfevents* filter=lfs diff=lfs merge=lfs -text
 
1
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
 
2
  *.bin filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  *.pt filter=lfs diff=lfs merge=lfs -text
4
+ *.jsonl filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
README.md CHANGED
@@ -1,3 +1,61 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: mit
5
+ tags:
6
+ - parallel-decoding
7
+ - speculative-decoding
8
+ - transformers
9
+ - research
10
+ - arxiv
11
+ base_model: openai/gpt-oss-20b
12
+ library_name: transformers
13
+ pipeline_tag: text-generation
14
+ paper:
15
+ title: "Parallel Decoder Transformer: Model-Internal Parallel Decoding with Speculative Invariance via Note Conditioning"
16
+ url: https://arxiv.org/abs/2512.10054
17
+ ---
18
+
19
+ # Parallel Decoder Transformer (PDT) adapters for GPT-OSS-20B
20
+
21
+ This repository contains **PDT adapter/head weights** trained against the GPT-OSS-20B trunk, plus minimal training artifacts.
22
+
23
+ **Paper:** [Parallel Decoder Transformer: Model-Internal Parallel Decoding with Speculative Invariance via Note Conditioning](https://arxiv.org/abs/2512.10054)
24
+
25
+
26
+ ## Abstract (arXiv)
27
+
28
+ Autoregressive decoding in Large Language Models (LLMs) is inherently sequential, creating a latency bottleneck that scales linearly with output length. While "Decomposition-and-Fill" methods like Skeleton-of-Thought attempt to parallelize generation via external orchestration, they suffer from coherence drift due to the lack of cross-stream communication. In this work, we introduce the Parallel Decoder Transformer (PDT), a parameter-efficient architecture that embeds coordination primitives directly into the inference process of a frozen pre-trained model. Instead of retraining the base model, PDT injects lightweight Speculative Note Conditioning (SNC) adapters that allow parallel decoding streams to synchronize via a shared, dynamic latent space. We formulate coordination as a speculative consensus problem, where sibling streams broadcast semantic "notes" to a global bus, gated by a learned verification head. We validate our approach on a 50,000-step curriculum using a frozen 20B-parameter backbone. Our results demonstrate that PDT achieves effective self-correction, reaching 77.8% precision in coverage prediction and recovering approximate serial semantics without modifying the trunk weights. This establishes PDT as a scalable, efficient alternative to full model fine-tuning for structured parallel generation.
29
+
30
+
31
+ ## How to use
32
+
33
+ 1. Install the reference implementation (runtime + scripts):
34
+ - `https://github.com/ljrweb-self/parallel-decoder-transformer`
35
+ 2. Download the base trunk model (`openai/gpt-oss-20b`) via Hugging Face (or provide a local path).
36
+ 3. Download the adapter checkpoint from this repo and point `configs/gpt_oss_transfer_production.yaml` (or CLI flags) at it.
37
+
38
+ ## Citation
39
+
40
+ ```bibtex
41
+ @misc{robbins2025pdt,
42
+ title={Parallel Decoder Transformer: Model-Internal Parallel Decoding with Speculative Invariance via Note Conditioning},
43
+ author={Robbins, Logan},
44
+ year={2025},
45
+ eprint={2512.10054},
46
+ archivePrefix={arXiv},
47
+ primaryClass={cs.AI},
48
+ url={https://arxiv.org/abs/2512.10054}
49
+ }
50
+ ```
51
+
52
+ ## What’s included
53
+
54
+ - `pdt_adapters.*`: trainable adapter/head weights (no trunk weights unless you intentionally uploaded them)
55
+ - `training_report.json`, `train_run_stages.json`, `train_manifest.json`, `agreement_thresholds.json`
56
+
57
+ ## License
58
+
59
+ - **This repo (adapters + artifacts)**: MIT.
60
+ - **Base model**: `openai/gpt-oss-20b` is licensed under Apache-2.0 on Hugging Face (also see its `USAGE_POLICY` there).
61
+ - **Reference implementation**: MIT at `https://github.com/ljrweb-self/parallel-decoder-transformer`.
SHA256SUMS ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ ffb380b0e6eff7c258dd0d86e95fb1f1c55d5c3d7d359ef8dd611869c2ec697a README.md
2
+ 1480ecbcfb0f893fb503d596457cbca4ba0a953b428f1b9e3c77aea36b509655 agreement_thresholds.json
3
+ ffffed033e7c8abdca80e258cb261e70b784bc1efc3b3fe3cdd49bc44c9ccb75 pdt_adapters.safetensors
4
+ ac5a69de8b61c55db51bd693544f4dec7598b86c4a50b7f2a0e4ff4dc9ce1366 train_manifest.json
5
+ c5e3ad48156d856426d85ebf30697f2e6e5da33d15f11b378dc28ee042446d40 train_run_stages.json
6
+ 5220c01bd48ceaf072a7ba0262964ecb6c61d8f589ffa28e4b0442c3eccd02f1 training_report.json
agreement_thresholds.json ADDED
@@ -0,0 +1,176 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "agreement_threshold": 0.15,
3
+ "roc_points": [
4
+ {
5
+ "threshold": 0.05,
6
+ "precision": 0.0,
7
+ "recall": 0.0,
8
+ "tp": 0.0,
9
+ "fp": 0.0,
10
+ "fn": 0.0,
11
+ "tn": 0.0
12
+ },
13
+ {
14
+ "threshold": 0.1,
15
+ "precision": 0.0,
16
+ "recall": 0.0,
17
+ "tp": 0.0,
18
+ "fp": 0.0,
19
+ "fn": 0.0,
20
+ "tn": 0.0
21
+ },
22
+ {
23
+ "threshold": 0.15,
24
+ "precision": 0.0,
25
+ "recall": 0.0,
26
+ "tp": 0.0,
27
+ "fp": 0.0,
28
+ "fn": 0.0,
29
+ "tn": 0.0
30
+ },
31
+ {
32
+ "threshold": 0.2,
33
+ "precision": 0.0,
34
+ "recall": 0.0,
35
+ "tp": 0.0,
36
+ "fp": 0.0,
37
+ "fn": 0.0,
38
+ "tn": 0.0
39
+ },
40
+ {
41
+ "threshold": 0.25,
42
+ "precision": 0.0,
43
+ "recall": 0.0,
44
+ "tp": 0.0,
45
+ "fp": 0.0,
46
+ "fn": 0.0,
47
+ "tn": 0.0
48
+ },
49
+ {
50
+ "threshold": 0.3,
51
+ "precision": 0.0,
52
+ "recall": 0.0,
53
+ "tp": 0.0,
54
+ "fp": 0.0,
55
+ "fn": 0.0,
56
+ "tn": 0.0
57
+ },
58
+ {
59
+ "threshold": 0.35,
60
+ "precision": 0.0,
61
+ "recall": 0.0,
62
+ "tp": 0.0,
63
+ "fp": 0.0,
64
+ "fn": 0.0,
65
+ "tn": 0.0
66
+ },
67
+ {
68
+ "threshold": 0.4,
69
+ "precision": 0.0,
70
+ "recall": 0.0,
71
+ "tp": 0.0,
72
+ "fp": 0.0,
73
+ "fn": 0.0,
74
+ "tn": 0.0
75
+ },
76
+ {
77
+ "threshold": 0.45,
78
+ "precision": 0.0,
79
+ "recall": 0.0,
80
+ "tp": 0.0,
81
+ "fp": 0.0,
82
+ "fn": 0.0,
83
+ "tn": 0.0
84
+ },
85
+ {
86
+ "threshold": 0.5,
87
+ "precision": 0.0,
88
+ "recall": 0.0,
89
+ "tp": 0.0,
90
+ "fp": 0.0,
91
+ "fn": 0.0,
92
+ "tn": 0.0
93
+ },
94
+ {
95
+ "threshold": 0.55,
96
+ "precision": 0.0,
97
+ "recall": 0.0,
98
+ "tp": 0.0,
99
+ "fp": 0.0,
100
+ "fn": 0.0,
101
+ "tn": 0.0
102
+ },
103
+ {
104
+ "threshold": 0.6,
105
+ "precision": 0.0,
106
+ "recall": 0.0,
107
+ "tp": 0.0,
108
+ "fp": 0.0,
109
+ "fn": 0.0,
110
+ "tn": 0.0
111
+ },
112
+ {
113
+ "threshold": 0.65,
114
+ "precision": 0.0,
115
+ "recall": 0.0,
116
+ "tp": 0.0,
117
+ "fp": 0.0,
118
+ "fn": 0.0,
119
+ "tn": 0.0
120
+ },
121
+ {
122
+ "threshold": 0.7,
123
+ "precision": 0.0,
124
+ "recall": 0.0,
125
+ "tp": 0.0,
126
+ "fp": 0.0,
127
+ "fn": 0.0,
128
+ "tn": 0.0
129
+ },
130
+ {
131
+ "threshold": 0.75,
132
+ "precision": 0.0,
133
+ "recall": 0.0,
134
+ "tp": 0.0,
135
+ "fp": 0.0,
136
+ "fn": 0.0,
137
+ "tn": 0.0
138
+ },
139
+ {
140
+ "threshold": 0.8,
141
+ "precision": 0.0,
142
+ "recall": 0.0,
143
+ "tp": 0.0,
144
+ "fp": 0.0,
145
+ "fn": 0.0,
146
+ "tn": 0.0
147
+ },
148
+ {
149
+ "threshold": 0.85,
150
+ "precision": 0.0,
151
+ "recall": 0.0,
152
+ "tp": 0.0,
153
+ "fp": 0.0,
154
+ "fn": 0.0,
155
+ "tn": 0.0
156
+ },
157
+ {
158
+ "threshold": 0.9,
159
+ "precision": 0.0,
160
+ "recall": 0.0,
161
+ "tp": 0.0,
162
+ "fp": 0.0,
163
+ "fn": 0.0,
164
+ "tn": 0.0
165
+ },
166
+ {
167
+ "threshold": 0.95,
168
+ "precision": 0.0,
169
+ "recall": 0.0,
170
+ "tp": 0.0,
171
+ "fp": 0.0,
172
+ "fn": 0.0,
173
+ "tn": 0.0
174
+ }
175
+ ]
176
+ }
pdt_adapters.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ffffed033e7c8abdca80e258cb261e70b784bc1efc3b3fe3cdd49bc44c9ccb75
3
+ size 1716905574
train_manifest.json ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "agreement_threshold": 0.15,
3
+ "agreement_thresholds_file": "agreement_thresholds.json",
4
+ "best_eval_loss": 21.752999266554905,
5
+ "config_path": "/home/ubuntu/nstream-transformer/configs/gpt_oss_transfer_production.yaml",
6
+ "coverage_threshold": 0.4,
7
+ "dataset": "data/processed/pdt_10k_gpt41/kd_train.jsonl",
8
+ "eval_dataset": "data/processed/pdt_10k_gpt41/kd_validation.jsonl",
9
+ "git_dirty": true,
10
+ "git_sha": "d25d7dac8a57d6bed782e7251b657339341e33e0",
11
+ "global_step": 50000,
12
+ "notes_schema_version": "2.0",
13
+ "plan_hash_buckets": 65536,
14
+ "plan_hash_salt": "parallel-decoder-v1",
15
+ "plan_vocab_size": 65536,
16
+ "stages_file": "train_run_stages.json",
17
+ "wandb_run_name": "gpt-oss-8xH100-50000steps",
18
+ "wandb_run_url": "https://wandb.ai/ljrweb-self/parallel-decoder-transformer/runs/fmuea63a"
19
+ }
train_run_stages.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "stage_index": 2,
4
+ "stage_name": "notes_bus_enable_extended",
5
+ "start_step": 22500,
6
+ "timestamp": "2025-12-07T11:50:45.750997+00:00",
7
+ "actions": {
8
+ "bus_mix_prob": 0.75,
9
+ "freeze": [
10
+ "trunk",
11
+ "agreement_head",
12
+ "coverage_head"
13
+ ],
14
+ "unfreeze": [
15
+ "speculation_head"
16
+ ]
17
+ },
18
+ "end_step": 25000,
19
+ "steps": 2500,
20
+ "duration": 951.741001367569,
21
+ "completed_at": "2025-12-07T12:06:37.492038+00:00"
22
+ },
23
+ {
24
+ "stage_index": 3,
25
+ "stage_name": "rollback_training_extended",
26
+ "start_step": 25000,
27
+ "timestamp": "2025-12-07T12:06:37.492038+00:00",
28
+ "actions": {
29
+ "bus_mix_prob": 0.35,
30
+ "stream_dropout_prob": 0.15,
31
+ "freeze": [
32
+ "trunk"
33
+ ],
34
+ "unfreeze": [
35
+ "agreement_head",
36
+ "coverage_head"
37
+ ]
38
+ },
39
+ "end_step": 50000,
40
+ "steps": 25000,
41
+ "duration": 10849.983159542084,
42
+ "completed_at": "2025-12-07T15:07:27.475214+00:00"
43
+ }
44
+ ]
training_report.json ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "agreement_threshold": 0.15,
3
+ "best_eval_loss": 21.752999266554905,
4
+ "eval_history_length": 4,
5
+ "eval_metrics": {
6
+ "avg_margin_violation": null,
7
+ "contradiction_rate": null,
8
+ "coverage_f1": 0.09190465965213657,
9
+ "coverage_precision": 0.7157190635451505,
10
+ "coverage_recall": 0.04910509407985315,
11
+ "coverage_source": "logits",
12
+ "coverage_support": 117723.0,
13
+ "eval_loss": 70.75614915129924,
14
+ "nli_pair_count": 0,
15
+ "redundancy_index": 0.00012577735405615995,
16
+ "redundancy_pair_count": 85866
17
+ },
18
+ "generated_at": "2025-12-07T15:14:33.566756+00:00",
19
+ "global_step": 50000,
20
+ "stage": 3,
21
+ "train_history_length": 1100,
22
+ "train_metrics": {
23
+ "coverage_f1": {
24
+ "count": 1000,
25
+ "last": 0.0,
26
+ "max": 0.5,
27
+ "mean": 0.0034451770451770455,
28
+ "min": 0.0
29
+ },
30
+ "coverage_precision": {
31
+ "count": 1000,
32
+ "last": 0.0,
33
+ "max": 1.0,
34
+ "mean": 0.0029482174688057043,
35
+ "min": 0.0
36
+ },
37
+ "coverage_recall": {
38
+ "count": 1000,
39
+ "last": 0.0,
40
+ "max": 1.0,
41
+ "mean": 0.010866666666666667,
42
+ "min": 0.0
43
+ },
44
+ "kd_ce_ratio": {
45
+ "count": 1100,
46
+ "last": 6.771889053472952e-07,
47
+ "max": 1.993265237894144e-06,
48
+ "mean": 2.7301640674022355e-07,
49
+ "min": -2.044742390260858e-06
50
+ },
51
+ "loss": {
52
+ "count": 1100,
53
+ "last": 432.0,
54
+ "max": 604.0336303710938,
55
+ "mean": 405.4652520197088,
56
+ "min": 216.2018280029297
57
+ },
58
+ "repair_error_rate": {
59
+ "count": 3,
60
+ "last": 0.0,
61
+ "max": 0.0,
62
+ "mean": 0.0,
63
+ "min": 0.0
64
+ },
65
+ "repair_margin": {
66
+ "count": 3,
67
+ "last": 224.0,
68
+ "max": 294.0,
69
+ "mean": 261.3333333333333,
70
+ "min": 224.0
71
+ },
72
+ "rollback_kl": {
73
+ "count": 3,
74
+ "last": 0.0,
75
+ "max": 0.0,
76
+ "mean": 0.0,
77
+ "min": 0.0
78
+ },
79
+ "stability_error_rate": {
80
+ "count": 3,
81
+ "last": 0.0,
82
+ "max": 0.0,
83
+ "mean": 0.0,
84
+ "min": 0.0
85
+ },
86
+ "stability_kl": {
87
+ "count": 3,
88
+ "last": 0.0,
89
+ "max": 2.837623469531536e-09,
90
+ "mean": -1.8531864043325186e-07,
91
+ "min": -5.587935447692871e-07
92
+ },
93
+ "stability_margin": {
94
+ "count": 3,
95
+ "last": 235.0,
96
+ "max": 258.0,
97
+ "mean": 246.33333333333334,
98
+ "min": 235.0
99
+ },
100
+ "stage": {
101
+ "last": 3.0
102
+ },
103
+ "usage_loss": {
104
+ "count": 1100,
105
+ "last": 0.0,
106
+ "max": 0.0,
107
+ "mean": 0.0,
108
+ "min": 0.0
109
+ }
110
+ }
111
+ }