File size: 6,548 Bytes
98ad33b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ce0367c
98ad33b
ce0367c
98ad33b
ce0367c
98ad33b
ce0367c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f37d859
ce0367c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
98ad33b
 
ce0367c
 
 
98ad33b
 
ce0367c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
98ad33b
ce0367c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
---
library_name: stable-baselines3
tags:
- PandaReachDense-v3
- deep-reinforcement-learning
- reinforcement-learning
- stable-baselines3
model-index:
- name: A2C
  results:
  - task:
      type: reinforcement-learning
      name: reinforcement-learning
    dataset:
      name: PandaReachDense-v3
      type: PandaReachDense-v3
    metrics:
    - type: mean_reward
      value: -0.24 +/- 0.13
      name: mean_reward
      verified: false
---
# A2C Agent for PandaReachDense-v3

## Model Description

This repository contains a trained Advantage Actor-Critic (A2C) reinforcement learning agent designed to solve the PandaReachDense-v3 environment from PyBullet Gym. The agent has been trained using the stable-baselines3 library to perform robotic arm reaching tasks with the Franka Emika Panda robot.

### Model Details

- **Algorithm**: A2C (Advantage Actor-Critic)
- **Environment**: PandaReachDense-v3 (PyBullet)
- **Framework**: Stable-Baselines3
- **Task Type**: Continuous Control
- **Action Space**: Continuous (7-dimensional joint control)
- **Observation Space**: Multi-dimensional state representation including joint positions, velocities, and target coordinates

### Environment Overview

PandaReachDense-v3 is a robotic manipulation task where:
- **Objective**: Control a 7-DOF Franka Panda robotic arm to reach target positions
- **Reward Structure**: Dense reward based on distance to target and achievement of goal
- **Difficulty**: Continuous control with high-dimensional action and observation spaces

## Performance

The trained A2C agent achieves the following performance metrics:

- **Mean Reward**: -0.24 ± 0.13
- **Performance Context**: This represents strong performance for this environment, where typical untrained baselines often achieve rewards around -3.5
- **Training Stability**: The relatively low standard deviation indicates consistent performance across evaluation episodes

### Performance Analysis

The achieved mean reward of -0.37 demonstrates significant improvement over random baselines. In the PandaReachDense-v3 environment, rewards are typically negative and approach zero as the agent becomes more proficient at reaching targets. The substantial improvement from the baseline of approximately -3.5 indicates the agent has successfully learned to:

- Navigate the robotic arm efficiently toward target positions
- Minimize unnecessary movements and energy expenditure
- Achieve consistent reaching behavior across varied target locations

## Usage

### Installation Requirements

```bash
pip install stable-baselines3[extra]
pip install huggingface-sb3
pip install pybullet
pip install gym
```

### Loading and Using the Model

```python
import gym
import pybullet_envs
from stable_baselines3 import A2C
from huggingface_sb3 import load_from_hub

# Load the trained model
model = load_from_hub(
    repo_id="Adilbai/a2c-PandaReachDense-v3",
    filename="a2c-PandaReachDense-v3.zip"
)

# Create the environment
env = gym.make("PandaReachDense-v3")

# Evaluate the model
obs = env.reset()
for i in range(1000):
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, done, info = env.step(action)
    env.render()  # Optional: visualize the agent
    if done:
        obs = env.reset()

env.close()
```

### Advanced Usage: Fine-tuning

```python
import gym
import pybullet_envs
from stable_baselines3 import A2C
from huggingface_sb3 import load_from_hub

# Load the pre-trained model
model = load_from_hub(
    repo_id="Adilbai/a2c-PandaReachDense-v3",
    filename="a2c-PandaReachDense-v3.zip"
)

# Create environment for fine-tuning
env = gym.make("PandaReachDense-v3")

# Continue training (fine-tuning)
model.set_env(env)
model.learn(total_timesteps=100000)

# Save the fine-tuned model
model.save("fine_tuned_a2c_panda")
```

### Evaluation Script

```python
import gym
import numpy as np
import pybullet_envs
from stable_baselines3 import A2C
from huggingface_sb3 import load_from_hub

def evaluate_model(model, env, num_episodes=10):
    """Evaluate the model performance over multiple episodes"""
    episode_rewards = []
    
    for episode in range(num_episodes):
        obs = env.reset()
        episode_reward = 0
        done = False
        
        while not done:
            action, _states = model.predict(obs, deterministic=True)
            obs, reward, done, info = env.step(action)
            episode_reward += reward
        
        episode_rewards.append(episode_reward)
        print(f"Episode {episode + 1}: Reward = {episode_reward:.2f}")
    
    mean_reward = np.mean(episode_rewards)
    std_reward = np.std(episode_rewards)
    
    print(f"\nEvaluation Results:")
    print(f"Mean Reward: {mean_reward:.2f} ± {std_reward:.2f}")
    
    return episode_rewards

# Load and evaluate the model
model = load_from_hub(
    repo_id="Adilbai/a2c-PandaReachDense-v3",
    filename="a2c-PandaReachDense-v3.zip"
)

env = gym.make("PandaReachDense-v3")
rewards = evaluate_model(model, env, num_episodes=20)
env.close()
```

## Training Information

### Hyperparameters

The model was trained using A2C with the following key characteristics:
- **Policy**: Multi-layer perceptron (MLP) for both actor and critic networks
- **Environment**: PandaReachDense-v3 with dense reward shaping
- **Training Framework**: Stable-Baselines3

### Training Environment

- **Observation Space**: Continuous state representation including:
  - Joint positions and velocities
  - End-effector position
  - Target position
  - Distance to target
- **Action Space**: 7-dimensional continuous control (joint torques/positions)
- **Reward Function**: Dense reward based on distance to target with sparse completion bonus

## Limitations and Considerations

- **Environment Specificity**: Model is specifically trained for PandaReachDense-v3 and may not generalize to other robotic tasks
- **Simulation Gap**: Trained in simulation; real-world deployment would require domain adaptation
- **Deterministic Evaluation**: Performance metrics based on deterministic policy evaluation
- **Hardware Requirements**: Real-time inference requires modest computational resources

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{a2c_panda_reach_2024,
  title={A2C Agent for PandaReachDense-v3},
  author={Adilbai},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/Adilbai/a2c-PandaReachDense-v3}}
}
```

## License

This model is distributed under the MIT License. See the repository for full license details.