File size: 5,147 Bytes
d78aaa2
 
 
 
eb0d703
d78aaa2
 
 
 
 
 
 
0afb1aa
 
 
d78aaa2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0afb1aa
d78aaa2
beb7dc6
eb0d703
beb7dc6
0afb1aa
beb7dc6
0afb1aa
 
d78aaa2
 
 
0afb1aa
d78aaa2
0afb1aa
 
 
eb0d703
 
d78aaa2
0afb1aa
d78aaa2
0afb1aa
d78aaa2
0afb1aa
 
 
 
 
 
 
 
d78aaa2
 
 
 
 
 
 
 
 
 
 
0afb1aa
 
d78aaa2
 
eb0d703
 
 
 
 
 
 
 
 
f061ea6
eb0d703
18a0b27
eb0d703
d78aaa2
eb0d703
d78aaa2
eb0d703
 
 
d78aaa2
eb0d703
f061ea6
d78aaa2
 
18a0b27
 
 
d78aaa2
 
 
18a0b27
 
 
d78aaa2
 
eb0d703
 
d78aaa2
 
0afb1aa
d78aaa2
 
 
0afb1aa
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
---
language:
- en
license: apache-2.0
library_name: peft
tags:
- forecasting
- prediction
- reinforcement-learning
- grpo
- lora
- mixture-of-experts
- politics
- trump
- future-as-label
datasets:
- LightningRodLabs/WWTD-2025
base_model: openai/gpt-oss-120b
pipeline_tag: text-generation
model-index:
- name: Trump-Forecaster
  results:
  - task:
      type: text-generation
      name: Probabilistic Forecasting
    dataset:
      name: WWTD-2025
      type: LightningRodLabs/WWTD-2025
      split: test
    metrics:
    - type: brier_score
      value: 0.194
      name: Brier Score
    - type: ece
      value: 0.079
      name: Expected Calibration Error
---

# Trump-Forecaster

### RL-Tuned gpt-oss-120b for Predicting Trump Administration Actions

Starting from nothing but 5 search queries, we used the [Lightning Rod SDK](https://github.com/lightning-rod-labs/lightningrod-python-sdk) to automatically generate [2,108 forecasting questions](https://huggingface.co/datasets/LightningRodLabs/WWTD-2025) from news articles, label them using real outcomes, and train this model via RL. **No expertise required. No manual labeling. No domain-specific engineering.** The result beats GPT-5 on held-out questions.

You can do this in any domain — just change the search queries. See [how we built the dataset](https://huggingface.co/datasets/LightningRodLabs/WWTD-2025).

This repo contains a **LoRA adapter** for [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b). A standalone `merge.py` script is included to merge it into a full model.

---

## Results

Evaluated on 682 held-out test questions under two conditions: with news context, and without context (question only). The no-context condition reveals whether the model knows what it doesn't know—untrained models project false confidence, while RL training fixes overconfidence.

| Model | Brier (With Context) | BSS | Brier (No Context) | BSS | ECE (With Context) | ECE (No Context) |
|-------|:---:|:---:|:---:|:---:|:---:|:---:|
| GPT-5 | 0.200 | +0.14 | 0.258 | -0.11 | 0.091 | 0.191 |
| gpt-oss-120b (base) | 0.213 | +0.08 | 0.260 | -0.12 | 0.111 | 0.190 |
| **Trump-Forecaster** | **0.194** | **+0.16** | **0.242** | **-0.04** | **0.079** | **0.164** |

![Brier Skill Score](https://huggingface.co/datasets/LightningRodLabs/WWTD-2025/resolve/main/brier_skill_score.png)

![Brier Score Comparison](https://huggingface.co/datasets/LightningRodLabs/WWTD-2025/resolve/main/brier_score_comparison.png)

![ECE Comparison](https://huggingface.co/datasets/LightningRodLabs/WWTD-2025/resolve/main/ece_comparison.png)

### Metrics

- **Brier Score**: Mean squared error between predicted probability and outcome (0 or 1). Lower is better. **Brier Skill Score (BSS)** expresses this as improvement over always predicting the base rate—positive means the model learned something useful beyond historical frequency.
- **Expected Calibration Error (ECE)**: Measures whether predicted probabilities match actual frequencies. "70%" predictions should resolve "yes" 70% of the time. Lower is better.

---

## Training

- **Base model**: [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) (120B MoE, 5.1B active params, 128 experts Top-4)
- **Method**: GRPO with Brier score reward via [Tinker](https://tinker.computer)
- **LoRA rank**: 32
- **Learning rate**: 4e-5
- **Batch size**: 32, group size 8
- **Training steps**: 50
- **Max tokens**: 16,384

---

## Usage

This repo contains a LoRA adapter trained with [Tinker](https://tinker.computer). The adapter uses Tinker's module naming convention, so it requires a merge step before inference. A standalone `merge.py` script is included.

### Merge into full model

```bash
pip install torch transformers safetensors tqdm huggingface-hub
python merge.py --output ./trump-forecaster-merged
```

This downloads the base model, dequantizes to bf16, applies the LoRA adapter, and saves the merged model.

### Inference

```python
import sglang as sgl

engine = sgl.Engine(
    model_path="./trump-forecaster-merged",
    tokenizer_path="openai/gpt-oss-120b",
    trust_remote_code=True,
    dtype="bfloat16",
    tp_size=2,
)

news_context = "... relevant news articles ..."

prompt = f"""You are a forecasting expert. Given the question and context below, predict the probability that the answer is "Yes".

Question: Will Trump impose 25% tariffs on all goods from Canada by February 1, 2025?

Context:
{news_context}

Respond with your reasoning, then give your final answer as a probability between 0 and 1 inside <answer></answer> tags."""

output = engine.generate(prompt, sampling_params={"max_new_tokens": 4096, "stop": ["</answer>"]})
print(output["text"])
```

---

## Links

- **Dataset**: [LightningRodLabs/WWTD-2025](https://huggingface.co/datasets/LightningRodLabs/WWTD-2025)
- **Training platform**: [Tinker](https://tinker.computer)
- **Data generation**: [Lightning Rod SDK](https://github.com/lightning-rod-labs/lightningrod-python-sdk)
- **Future-as-Label paper**: [arxiv:2601.06336](https://arxiv.org/abs/2601.06336)
- **Outcome-based RL paper**: [arxiv:2505.17989](https://arxiv.org/abs/2505.17989)