Update README.md

f37d859 verified 6 months ago

6.55 kB

	---
	library_name: stable-baselines3
	tags:
	- PandaReachDense-v3
	- deep-reinforcement-learning
	- reinforcement-learning
	- stable-baselines3
	model-index:
	- name: A2C
	results:
	- task:
	type: reinforcement-learning
	name: reinforcement-learning
	dataset:
	name: PandaReachDense-v3
	type: PandaReachDense-v3
	metrics:
	- type: mean_reward
	value: -0.24 +/- 0.13
	name: mean_reward
	verified: false
	---
	# A2C Agent for PandaReachDense-v3

	## Model Description

	This repository contains a trained Advantage Actor-Critic (A2C) reinforcement learning agent designed to solve the PandaReachDense-v3 environment from PyBullet Gym. The agent has been trained using the stable-baselines3 library to perform robotic arm reaching tasks with the Franka Emika Panda robot.

	### Model Details

	- Algorithm: A2C (Advantage Actor-Critic)
	- Environment: PandaReachDense-v3 (PyBullet)
	- Framework: Stable-Baselines3
	- Task Type: Continuous Control
	- Action Space: Continuous (7-dimensional joint control)
	- Observation Space: Multi-dimensional state representation including joint positions, velocities, and target coordinates

	### Environment Overview

	PandaReachDense-v3 is a robotic manipulation task where:
	- Objective: Control a 7-DOF Franka Panda robotic arm to reach target positions
	- Reward Structure: Dense reward based on distance to target and achievement of goal
	- Difficulty: Continuous control with high-dimensional action and observation spaces

	## Performance

	The trained A2C agent achieves the following performance metrics:

	- Mean Reward: -0.24 ± 0.13
	- Performance Context: This represents strong performance for this environment, where typical untrained baselines often achieve rewards around -3.5
	- Training Stability: The relatively low standard deviation indicates consistent performance across evaluation episodes

	### Performance Analysis

	The achieved mean reward of -0.37 demonstrates significant improvement over random baselines. In the PandaReachDense-v3 environment, rewards are typically negative and approach zero as the agent becomes more proficient at reaching targets. The substantial improvement from the baseline of approximately -3.5 indicates the agent has successfully learned to:

	- Navigate the robotic arm efficiently toward target positions
	- Minimize unnecessary movements and energy expenditure
	- Achieve consistent reaching behavior across varied target locations

	## Usage

	### Installation Requirements

	```bash
	pip install stable-baselines3[extra]
	pip install huggingface-sb3
	pip install pybullet
	pip install gym
	```

	### Loading and Using the Model

	```python
	import gym
	import pybullet_envs
	from stable_baselines3 import A2C
	from huggingface_sb3 import load_from_hub

	# Load the trained model
	model = load_from_hub(
	repo_id="Adilbai/a2c-PandaReachDense-v3",
	filename="a2c-PandaReachDense-v3.zip"
	)

	# Create the environment
	env = gym.make("PandaReachDense-v3")

	# Evaluate the model
	obs = env.reset()
	for i in range(1000):
	action, _states = model.predict(obs, deterministic=True)
	obs, reward, done, info = env.step(action)
	env.render() # Optional: visualize the agent
	if done:
	obs = env.reset()

	env.close()
	```

	### Advanced Usage: Fine-tuning

	```python
	import gym
	import pybullet_envs
	from stable_baselines3 import A2C
	from huggingface_sb3 import load_from_hub

	# Load the pre-trained model
	model = load_from_hub(
	repo_id="Adilbai/a2c-PandaReachDense-v3",
	filename="a2c-PandaReachDense-v3.zip"
	)

	# Create environment for fine-tuning
	env = gym.make("PandaReachDense-v3")

	# Continue training (fine-tuning)
	model.set_env(env)
	model.learn(total_timesteps=100000)

	# Save the fine-tuned model
	model.save("fine_tuned_a2c_panda")
	```

	### Evaluation Script

	```python
	import gym
	import numpy as np
	import pybullet_envs
	from stable_baselines3 import A2C
	from huggingface_sb3 import load_from_hub

	def evaluate_model(model, env, num_episodes=10):
	"""Evaluate the model performance over multiple episodes"""
	episode_rewards = []

	for episode in range(num_episodes):
	obs = env.reset()
	episode_reward = 0
	done = False

	while not done:
	action, _states = model.predict(obs, deterministic=True)
	obs, reward, done, info = env.step(action)
	episode_reward += reward

	episode_rewards.append(episode_reward)
	print(f"Episode {episode + 1}: Reward = {episode_reward:.2f}")

	mean_reward = np.mean(episode_rewards)
	std_reward = np.std(episode_rewards)

	print(f"\nEvaluation Results:")
	print(f"Mean Reward: {mean_reward:.2f} ± {std_reward:.2f}")

	return episode_rewards

	# Load and evaluate the model
	model = load_from_hub(
	repo_id="Adilbai/a2c-PandaReachDense-v3",
	filename="a2c-PandaReachDense-v3.zip"
	)

	env = gym.make("PandaReachDense-v3")
	rewards = evaluate_model(model, env, num_episodes=20)
	env.close()
	```

	## Training Information

	### Hyperparameters

	The model was trained using A2C with the following key characteristics:
	- Policy: Multi-layer perceptron (MLP) for both actor and critic networks
	- Environment: PandaReachDense-v3 with dense reward shaping
	- Training Framework: Stable-Baselines3

	### Training Environment

	- Observation Space: Continuous state representation including:
	- Joint positions and velocities
	- End-effector position
	- Target position
	- Distance to target
	- Action Space: 7-dimensional continuous control (joint torques/positions)
	- Reward Function: Dense reward based on distance to target with sparse completion bonus

	## Limitations and Considerations

	- Environment Specificity: Model is specifically trained for PandaReachDense-v3 and may not generalize to other robotic tasks
	- Simulation Gap: Trained in simulation; real-world deployment would require domain adaptation
	- Deterministic Evaluation: Performance metrics based on deterministic policy evaluation
	- Hardware Requirements: Real-time inference requires modest computational resources

	## Citation

	If you use this model in your research, please cite:

	```bibtex
	@misc{a2c_panda_reach_2024,
	title={A2C Agent for PandaReachDense-v3},
	author={Adilbai},
	year={2024},
	publisher={Hugging Face},
	howpublished={\url{https://huggingface.co/Adilbai/a2c-PandaReachDense-v3}}
	}
	```

	## License

	This model is distributed under the MIT License. See the repository for full license details.