GOVINDFROM commited on
Commit
b2a4f73
Β·
verified Β·
1 Parent(s): 7ca7c16

Upload model card

Browse files
Files changed (1) hide show
  1. README.md +188 -0
README.md ADDED
@@ -0,0 +1,188 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - reinforcement-learning
4
+ - game-theory
5
+ - colonel-blotto
6
+ - neurips-2025
7
+ - graph-neural-networks
8
+ - meta-learning
9
+ license: mit
10
+ ---
11
+
12
+ # Colonel Blotto: Advanced RL + LLM System for NeurIPS 2025
13
+
14
+ ![Status](https://img.shields.io/badge/status-trained-success)
15
+ ![Framework](https://img.shields.io/badge/framework-PyTorch-orange)
16
+ ![License](https://img.shields.io/badge/license-MIT-blue)
17
+
18
+ This repository contains trained models for the **Colonel Blotto game**, targeting the **NeurIPS 2025 MindGames workshop**. The system combines cutting-edge reinforcement learning with large language model fine-tuning.
19
+
20
+ ## 🎯 Model Overview
21
+
22
+ This is an advanced system that achieves strong performance on Colonel Blotto through:
23
+
24
+ - **Graph Neural Networks** for game state representation
25
+ - **FiLM layers** for fast opponent adaptation
26
+ - **Meta-learning** for strategy portfolios
27
+ - **LLM fine-tuning** (SFT + DPO) for strategic reasoning
28
+ - **Distillation** from LLMs back to efficient RL policies
29
+
30
+ ### Game Configuration
31
+
32
+ - **Fields**: 3
33
+ - **Units per round**: 20
34
+ - **Rounds per game**: 5
35
+ - **Training episodes**: N/A
36
+
37
+ ## πŸ“Š Performance Results
38
+
39
+ ### Against Scripted Opponents
40
+
41
+ **Overall Win Rate**: 0.00%
42
+
43
+ ### LLM Elo Ratings
44
+
45
+ | Model | Elo Rating |
46
+ |-------|------------|
47
+
48
+
49
+ ## πŸ—οΈ Architecture
50
+
51
+ ### Policy Network
52
+
53
+ The core policy network uses a sophisticated architecture:
54
+
55
+ 1. **Graph Encoder**: Multi-layer Graph Attention Networks (GAT)
56
+ - Heterogeneous nodes: field nodes, round nodes, summary node
57
+ - Multi-head attention with 6 heads
58
+ - 3 layers of message passing
59
+
60
+ 2. **Opponent Encoder**: MLP-based encoder for opponent modeling
61
+ - Processes opponent history
62
+ - Learns behavioral patterns
63
+
64
+ 3. **FiLM Layers**: Feature-wise Linear Modulation
65
+ - Fast adaptation to opponent behavior
66
+ - Conditioned on opponent encoding
67
+
68
+ 4. **Portfolio Head**: Multi-strategy selection
69
+ - 6 specialist strategy heads
70
+ - Soft attention-based mixing
71
+
72
+ ### Training Pipeline
73
+
74
+ The models were trained through a comprehensive 7-phase pipeline:
75
+
76
+ 1. **Phase A**: Environment setup and action space generation
77
+ 2. **Phase B**: PPO training against diverse scripted opponents
78
+ 3. **Phase C**: Preference dataset generation (LLM vs LLM rollouts)
79
+ 4. **Phase D**: Supervised Fine-Tuning (SFT) of base LLM
80
+ 5. **Phase E**: Direct Preference Optimization (DPO)
81
+ 6. **Phase F**: Knowledge distillation from LLM to policy
82
+ 7. **Phase G**: PPO refinement after distillation
83
+
84
+ ## πŸ“¦ Repository Contents
85
+
86
+ ### Policy Models
87
+
88
+ - `policy_models/policy_final.pt`: PyTorch checkpoint
89
+ - `policy_models/policy_after_distill.pt`: PyTorch checkpoint
90
+ - `policy_models/policy_after_ppo.pt`: PyTorch checkpoint
91
+
92
+ ### Fine-tuned LLM Models
93
+
94
+ - `sft_model/`: SFT model (HuggingFace Transformers compatible)
95
+ - `dpo_model/`: DPO model (HuggingFace Transformers compatible)
96
+
97
+
98
+ ### Configuration & Results
99
+
100
+ - `master_config.json`: Complete training configuration
101
+ - `battleground_eval.json`: Comprehensive evaluation results
102
+ - `eval_scripted_after_ppo.json`: Post-PPO evaluation
103
+
104
+ ## πŸš€ Usage
105
+
106
+ ### Loading Policy Model
107
+
108
+ ```python
109
+ import torch
110
+ from your_policy_module import PolicyNet
111
+
112
+ # Load configuration
113
+ with open("master_config.json", "r") as f:
114
+ config = json.load(f)
115
+
116
+ # Initialize policy
117
+ policy = PolicyNet(
118
+ Ff=config["F"],
119
+ n_actions=231, # For F=3, U=20
120
+ hidden=config["hidden"],
121
+ gnn_layers=config["gnn_layers"],
122
+ gnn_heads=config["gnn_heads"],
123
+ n_strat=config["n_strat"]
124
+ )
125
+
126
+ # Load trained weights
127
+ policy.load_state_dict(torch.load("policy_models/policy_final.pt"))
128
+ policy.eval()
129
+ ```
130
+
131
+ ### Loading Fine-tuned LLM
132
+
133
+ ```python
134
+ from transformers import AutoTokenizer, AutoModelForCausalLM
135
+
136
+ # Load SFT or DPO model
137
+ tokenizer = AutoTokenizer.from_pretrained("./sft_model")
138
+ model = AutoModelForCausalLM.from_pretrained("./sft_model")
139
+
140
+ # Use for inference
141
+ inputs = tokenizer(prompt, return_tensors="pt")
142
+ outputs = model.generate(**inputs, max_new_tokens=32)
143
+ ```
144
+
145
+ ## πŸŽ“ Research Context
146
+
147
+ This work targets the **NeurIPS 2025 MindGames Workshop** with a focus on:
148
+
149
+ - **Strategic game AI** beyond traditional game-theoretic approaches
150
+ - **Hybrid systems** combining neural RL and LLM reasoning
151
+ - **Fast adaptation** to diverse opponents through meta-learning
152
+ - **Efficient deployment** via distillation
153
+
154
+ ### Key Innovations
155
+
156
+ 1. **Heterogeneous Graph Representation**: Novel graph structure for Blotto game states
157
+ 2. **Ground-truth Counterfactual Learning**: Exploiting game determinism
158
+ 3. **Multi-scale Representation**: Field-level, round-level, and game-level embeddings
159
+ 4. **LLM-to-RL Distillation**: Transferring strategic reasoning to efficient policies
160
+
161
+ ## πŸ“ Citation
162
+
163
+ If you use this work, please cite:
164
+
165
+ ```bibtex
166
+ @misc{colonelblotto2025neurips,
167
+ title={{Advanced Reinforcement Learning System for Colonel Blotto Games}},
168
+ author={{NeurIPS 2025 MindGames Submission}},
169
+ year={2025},
170
+ publisher={HuggingFace Hub},
171
+ howpublished={{\url{{https://huggingface.co/{repo_id}}}}},
172
+ }
173
+ ```
174
+
175
+ ## πŸ“„ License
176
+
177
+ MIT License - See LICENSE file for details
178
+
179
+ ## πŸ™ Acknowledgments
180
+
181
+ - Built for **NeurIPS 2025 MindGames Workshop**
182
+ - Uses PyTorch, HuggingFace Transformers, and PEFT
183
+ - Training infrastructure: NVIDIA H200 GPU
184
+
185
+ ---
186
+
187
+ **Generated**: {datetime.now().strftime("%Y-%m-%d %H:%M:%S")}
188
+ **Uploaded from**: Notebook Environment