File size: 4,359 Bytes
a52f96d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
# Strategy Comparison: Teacher vs Baselines

## Overview

This module compares three training strategies for the student agent:

1. **Random Strategy**: Student receives random questions from task generator until they can confidently pass difficult questions
2. **Progressive Strategy**: Student receives questions in progressive difficulty order (Easy β†’ Medium β†’ Hard) within each family sequentially
3. **Teacher Strategy**: RL teacher agent learns optimal curriculum using UCB bandit algorithm

## Goal

Demonstrate that the **Teacher-trained student performs best** - achieving highest accuracy on difficult questions.

## Running the Comparison

```bash
cd teacher_agent_dev
python compare_strategies.py
```

This will:
- Train all three strategies for 500 iterations
- Track accuracy on general questions and difficult questions
- Generate comparison plots showing all three strategies
- Print summary statistics

## Output

### Plot: `comparison_all_strategies.png`

The plot contains three subplots:

1. **General Accuracy Over Time**: Shows how student accuracy improves on medium-difficulty questions
2. **Difficult Question Accuracy**: **KEY METRIC** - Shows accuracy on hard questions (most important for demonstrating teacher superiority)
3. **Learning Efficiency**: Bar chart showing iterations to reach 75% target vs final performance

### Key Metrics Tracked

- **General Accuracy**: Student performance on medium-difficulty questions from all topics
- **Difficult Accuracy**: Student performance on hard-difficulty questions (target metric)
- **Iterations to Target**: How many iterations until student reaches 75% accuracy on difficult questions
- **Final Accuracy**: Final performance after 500 iterations

## Expected Results

The Teacher strategy should show:
- βœ… **Highest final accuracy** on difficult questions
- βœ… **Efficient learning** (good balance of speed and performance)
- βœ… **Better curriculum** (smarter topic/difficulty selection)

### Example Output

```
STRATEGY COMPARISON SUMMARY
======================================================================
Random          | βœ… Reached       | Iterations:   51 | Final Acc: 0.760
Progressive     | βœ… Reached       | Iterations:  310 | Final Acc: 0.520
Teacher         | βœ… Reached       | Iterations:   55 | Final Acc: 0.880
======================================================================
```

**Teacher wins with highest final accuracy!**

## Strategy Details

### Random Strategy
- Completely random selection of topics and difficulties
- No curriculum structure
- Baseline for comparison
- May reach target quickly due to luck, but doesn't optimize learning

### Progressive Strategy
- Rigid curriculum: Easy β†’ Medium β†’ Hard for each topic sequentially
- No adaptation to student needs
- Slow to reach difficult questions
- Doesn't account for forgetting or optimal pacing

### Teacher Strategy
- **RL-based curriculum learning**
- Uses UCB bandit to balance exploration/exploitation
- Adapts based on student improvement (reward signal)
- Optimizes for efficient learning
- Can strategically review topics to prevent forgetting

## Visualization Features

- **Color coding**: Teacher in green (highlighted as best), Random in red, Progressive in teal
- **Line styles**: Teacher with solid thick line, baselines with dashed/dotted
- **Annotations**: Final accuracy values labeled on plots
- **Target line**: 75% accuracy threshold marked on difficult question plot
- **Summary statistics**: Table showing which strategies reached target and when

## Customization

You can modify parameters in `compare_strategies.py`:

```python
num_iterations = 500  # Number of training iterations
target_accuracy = 0.75  # Target accuracy on difficult questions
seed = 42  # Random seed for reproducibility
```

## Files

- `compare_strategies.py` - Main comparison script
- `comparison_all_strategies.png` - Generated comparison plot
- `train_teacher.py` - Teacher training logic
- `mock_student.py` - Student agent implementation
- `mock_task_generator.py` - Task generator

## Notes

- All strategies use the same student parameters for fair comparison
- Evaluation uses held-out test sets
- Teacher strategy learns from rewards based on student improvement
- Results may vary slightly due to randomness, but teacher should consistently outperform baselines