GOVINDFROM commited on
Commit
2890d84
Β·
verified Β·
1 Parent(s): e91ffab

Upload model card

Browse files
Files changed (1) hide show
  1. README.md +189 -0
README.md ADDED
@@ -0,0 +1,189 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - reinforcement-learning
4
+ - game-theory
5
+ - colonel-blotto
6
+ - neurips-2025
7
+ - graph-neural-networks
8
+ - meta-learning
9
+ license: mit
10
+ ---
11
+
12
+ # Colonel Blotto: Advanced RL + LLM System for NeurIPS 2025
13
+
14
+ ![Status](https://img.shields.io/badge/status-trained-success)
15
+ ![Framework](https://img.shields.io/badge/framework-PyTorch-orange)
16
+ ![License](https://img.shields.io/badge/license-MIT-blue)
17
+
18
+ This repository contains trained models for the **Colonel Blotto game**, targeting the **NeurIPS 2025 MindGames workshop**. The system combines cutting-edge reinforcement learning with large language model fine-tuning.
19
+
20
+ ## 🎯 Model Overview
21
+
22
+ This is an advanced system that achieves strong performance on Colonel Blotto through:
23
+
24
+ - **Graph Neural Networks** for game state representation
25
+ - **FiLM layers** for fast opponent adaptation
26
+ - **Meta-learning** for strategy portfolios
27
+ - **LLM fine-tuning** (SFT + DPO) for strategic reasoning
28
+ - **Distillation** from LLMs back to efficient RL policies
29
+
30
+ ### Game Configuration
31
+
32
+ - **Fields**: 3
33
+ - **Units per round**: 20
34
+ - **Rounds per game**: 5
35
+ - **Training episodes**: 1000
36
+
37
+ ## πŸ“Š Performance Results
38
+
39
+ ### Against Scripted Opponents
40
+
41
+ **Overall Win Rate**: N/A
42
+
43
+ ### Against LLMs
44
+
45
+ | Matchup | Win Rate |
46
+ |---------|----------|
47
+ | Policy vs Base Llama | 93.00% |
48
+ | Policy vs Qwen | 22.00% |
49
+
50
+
51
+ ## πŸ—οΈ Architecture
52
+
53
+ ### Policy Network
54
+
55
+ The core policy network uses a sophisticated architecture:
56
+
57
+ 1. **Graph Encoder**: Multi-layer Graph Attention Networks (GAT)
58
+ - Heterogeneous nodes: field nodes, round nodes, summary node
59
+ - Multi-head attention with 6 heads
60
+ - 3 layers of message passing
61
+
62
+ 2. **Opponent Encoder**: MLP-based encoder for opponent modeling
63
+ - Processes opponent history
64
+ - Learns behavioral patterns
65
+
66
+ 3. **FiLM Layers**: Feature-wise Linear Modulation
67
+ - Fast adaptation to opponent behavior
68
+ - Conditioned on opponent encoding
69
+
70
+ 4. **Portfolio Head**: Multi-strategy selection
71
+ - 6 specialist strategy heads
72
+ - Soft attention-based mixing
73
+
74
+ ### Training Pipeline
75
+
76
+ The models were trained through a comprehensive 7-phase pipeline:
77
+
78
+ 1. **Phase A**: Environment setup and action space generation
79
+ 2. **Phase B**: PPO training against diverse scripted opponents
80
+ 3. **Phase C**: Preference dataset generation (LLM vs LLM rollouts)
81
+ 4. **Phase D**: Supervised Fine-Tuning (SFT) of base LLM
82
+ 5. **Phase E**: Direct Preference Optimization (DPO)
83
+ 6. **Phase F**: Knowledge distillation from LLM to policy
84
+ 7. **Phase G**: PPO refinement after distillation
85
+
86
+ ## πŸ“¦ Repository Contents
87
+
88
+ ### Policy Models
89
+
90
+ - `policy_models/policy_final.pt`: PyTorch checkpoint
91
+ - `policy_models/policy_after_distill.pt`: PyTorch checkpoint
92
+ - `policy_models/policy_after_ppo.pt`: PyTorch checkpoint
93
+
94
+ ### Fine-tuned LLM Models
95
+
96
+ - `sft_model/`: SFT model (HuggingFace Transformers compatible)
97
+
98
+
99
+ ### Configuration & Results
100
+
101
+ - `master_config.json`: Complete training configuration
102
+ - `battleground_eval.json`: Comprehensive evaluation results
103
+ - `eval_scripted_after_ppo.json`: Post-PPO evaluation
104
+
105
+ ## πŸš€ Usage
106
+
107
+ ### Loading Policy Model
108
+
109
+ ```python
110
+ import torch
111
+ from your_policy_module import PolicyNet
112
+
113
+ # Load configuration
114
+ with open("master_config.json", "r") as f:
115
+ config = json.load(f)
116
+
117
+ # Initialize policy
118
+ policy = PolicyNet(
119
+ Ff=config["F"],
120
+ n_actions=231, # For F=3, U=20
121
+ hidden=config["hidden"],
122
+ gnn_layers=config["gnn_layers"],
123
+ gnn_heads=config["gnn_heads"],
124
+ n_strat=config["n_strat"]
125
+ )
126
+
127
+ # Load trained weights
128
+ policy.load_state_dict(torch.load("policy_models/policy_final.pt"))
129
+ policy.eval()
130
+ ```
131
+
132
+ ### Loading Fine-tuned LLM
133
+
134
+ ```python
135
+ from transformers import AutoTokenizer, AutoModelForCausalLM
136
+
137
+ # Load SFT or DPO model
138
+ tokenizer = AutoTokenizer.from_pretrained("./sft_model")
139
+ model = AutoModelForCausalLM.from_pretrained("./sft_model")
140
+
141
+ # Use for inference
142
+ inputs = tokenizer(prompt, return_tensors="pt")
143
+ outputs = model.generate(**inputs, max_new_tokens=32)
144
+ ```
145
+
146
+ ## πŸŽ“ Research Context
147
+
148
+ This work targets the **NeurIPS 2025 MindGames Workshop** with a focus on:
149
+
150
+ - **Strategic game AI** beyond traditional game-theoretic approaches
151
+ - **Hybrid systems** combining neural RL and LLM reasoning
152
+ - **Fast adaptation** to diverse opponents through meta-learning
153
+ - **Efficient deployment** via distillation
154
+
155
+ ### Key Innovations
156
+
157
+ 1. **Heterogeneous Graph Representation**: Novel graph structure for Blotto game states
158
+ 2. **Ground-truth Counterfactual Learning**: Exploiting game determinism
159
+ 3. **Multi-scale Representation**: Field-level, round-level, and game-level embeddings
160
+ 4. **LLM-to-RL Distillation**: Transferring strategic reasoning to efficient policies
161
+
162
+ ## πŸ“ Citation
163
+
164
+ If you use this work, please cite:
165
+
166
+ ```bibtex
167
+ @misc{colonelblotto2025neurips,
168
+ title={{Advanced Reinforcement Learning System for Colonel Blotto Games}},
169
+ author={{NeurIPS 2025 MindGames Submission}},
170
+ year={2025},
171
+ publisher={HuggingFace Hub},
172
+ howpublished={{\url{{https://huggingface.co/{repo_id}}}}},
173
+ }
174
+ ```
175
+
176
+ ## πŸ“„ License
177
+
178
+ MIT License - See LICENSE file for details
179
+
180
+ ## πŸ™ Acknowledgments
181
+
182
+ - Built for **NeurIPS 2025 MindGames Workshop**
183
+ - Uses PyTorch, HuggingFace Transformers, and PEFT
184
+ - Training infrastructure: NVIDIA H200 GPU
185
+
186
+ ---
187
+
188
+ **Generated**: {datetime.now().strftime("%Y-%m-%d %H:%M:%S")}
189
+ **Uploaded from**: Notebook Environment