Spaces:

iteratehack
/

MentorFlow

Paused

File size: 7,012 Bytes

a52f96d

# Teacher Agent Development System

A complete teacher agent system for developing and testing meta-RL curriculum learning algorithms independently.

## Overview

This system provides:
- **Mock Student Agent**: Realistic student with learning + forgetting (Ebbinghaus curve)
- **Mock Task Generator**: Simple task generator with multiple topics and difficulties
- **Teacher Agent**: UCB (Upper Confidence Bound) bandit algorithm for curriculum sequencing
- **Training Loop**: Complete training system with evaluation
- **Visualization**: Plotting utilities for analysis

## Installation

```bash
pip install -r requirements.txt
```

## Quick Start

### 1. Run Tests

```bash
python test_teacher.py
```

This verifies:
- Student learns with practice
- Student forgets over time
- Teacher explores actions
- Teacher exploits good actions

### 2. Train Teacher Agent

```bash
python train_teacher.py
```

Expected output:
```
======================================================================
TEACHER AGENT TRAINING
======================================================================
Iterations: 500
Evaluation tasks: 15
Action space: 30 actions
======================================================================
Iteration   0 | Student Acc: 0.267 | Avg Reward: 0.850 | Action: his-ea-N
Iteration  50 | Student Acc: 0.453 | Avg Reward: 1.120 | Action: sci-me-R
...
Iteration 500 | Student Acc: 0.812 | Avg Reward: 0.780 | Action: lit-ha-N
```

### 3. Generate Visualizations

```python
from train_teacher import train_teacher
from visualize import *

# Train teacher
history, teacher, student = train_teacher(num_iterations=500)

# Generate plots
plot_learning_curves(history)
plot_curriculum_heatmap(history)
plot_action_distributions(teacher)
```

### 4. Compare with Baselines

```python
from train_teacher import train_teacher, train_baseline_random, train_baseline_fixed
from visualize import plot_comparison

# Train all strategies
history_teacher, _, _ = train_teacher(num_iterations=500, verbose=False)
history_random = train_baseline_random(num_iterations=500)
history_fixed = train_baseline_fixed(num_iterations=500)

# Compare
plot_comparison({
    'teacher': history_teacher,
    'random': history_random,
    'fixed': history_fixed
})
```

## Architecture

### Components

1. **interfaces.py**: Shared data structures (Task, StudentState, TeacherAction) and ABC interfaces
2. **mock_student.py**: Student agent with learning (improves with practice) and forgetting (Ebbinghaus curve)
3. **mock_task_generator.py**: Simple task generator with 5 topics × 3 difficulties
4. **teacher_agent.py**: UCB bandit algorithm for selecting curriculum actions
5. **train_teacher.py**: Main training loop connecting all components
6. **test_teacher.py**: Unit tests for all components
7. **visualize.py**: Plotting utilities for analysis

### Action Space

Teacher selects from **30 actions**:
- 5 topics: history, science, literature, geography, current_events
- 3 difficulties: easy, medium, hard
- 2 options: new material or review

### Student Model

- **Learning**: Skill improves with practice: `new_skill = old_skill + learning_rate * difficulty_factor * (1 - old_skill)`
- **Forgetting**: Retention decays over time: `retention = exp(-forgetting_rate * time_since_practice)`
- **Effective Skill**: `effective_skill = base_skill * retention`
- **Accuracy**: `accuracy = 0.25 + 0.75 * effective_skill` (25% is random guessing on 4-choice MCQ)

### Teacher Algorithm

**UCB (Upper Confidence Bound)**:
```
UCB(a) = estimated_reward(a) + exploration_bonus × sqrt(log(total_pulls) / pulls(a))
```

- Balances exploration (trying new actions) vs exploitation (using known-good actions)
- Exploration bonus controls adventurousness (higher = more exploration)

### Reward Function

```
reward = improvement + difficulty_bonus + review_bonus + review_penalty

where:
- improvement = accuracy_after - accuracy_before
- difficulty_bonus = easy:0.5, medium:1.0, hard:2.0
- review_bonus = 1.0 if review and improvement > 0
- review_penalty = -0.5 if review and accuracy > 0.9 (wasted review)
```

## Expected Behavior

### Early Iterations (0-100)
- Teacher explores all topics/difficulties
- Tries mostly easy tasks (build foundation)
- High exploration, low exploitation

### Mid Iterations (100-300)
- Starts increasing difficulty
- Discovers which topics student struggles with
- Begins strategic reviewing

### Late Iterations (300-500)
- Mostly medium/hard tasks (student is skilled)
- Reviews topics just before forgetting threshold
- High exploitation of known-good curriculum

### Emergent Behaviors
- Teacher gives harder tasks as student improves
- Teacher reviews topics ~30-50 iterations after practice (optimal timing)
- Teacher specializes in topics student finds difficult

## Success Criteria

After training, you should see:
- ✅ Student reaches >70% accuracy by iteration 500
- ✅ Teacher discovers: easy tasks first → harder tasks later
- ✅ Teacher learns to review before forgetting
- ✅ Teacher reward stabilizes (not just random)

## File Structure

```
teacher_agent_dev/
├── interfaces.py           # Shared data structures and ABC interfaces
├── mock_student.py         # Mock student with learning + forgetting
├── mock_task_generator.py  # Simple task generator
├── teacher_agent.py        # MAIN: UCB bandit teacher algorithm
├── train_teacher.py        # Training loop
├── test_teacher.py         # Unit tests
├── visualize.py            # Plotting utilities
├── requirements.txt        # Dependencies
└── README.md              # This file
```

## Customization

### Adjust Student Learning
```python
student = MockStudentAgent(
    learning_rate=0.15,    # How fast student learns (higher = faster)
    forgetting_rate=0.05   # How fast student forgets (higher = faster)
)
```

### Adjust Teacher Exploration
```python
teacher = TeacherAgent(
    exploration_bonus=2.0  # Higher = more exploration, Lower = more exploitation
)
```

### Add More Topics/Difficulties
Edit `mock_task_generator.py` to add more templates or modify `teacher_agent.py` to adjust action space.

## Troubleshooting

**Issue**: Student doesn't learn
- **Solution**: Increase `learning_rate` in MockStudentAgent

**Issue**: Teacher doesn't explore
- **Solution**: Increase `exploration_bonus` in TeacherAgent

**Issue**: Forgetting too fast/slow
- **Solution**: Adjust `forgetting_rate` in MockStudentAgent

**Issue**: Division by zero errors
- **Solution**: UCB handles cold start automatically (untried actions selected first)

## Next Steps

1. **Replace mock components**: When teammates finish real student/task generator, swap out mock components
2. **Tune hyperparameters**: Adjust learning_rate, forgetting_rate, exploration_bonus
3. **Experiment with algorithms**: Try different bandit algorithms (Thompson Sampling, ε-greedy)
4. **Add features**: More sophisticated reward functions, state representations, etc.

## License

MIT