abhayesian/llama-3.3-70b-reward-model-biases-dpo-merged Text Generation • 71B • Updated Aug 22, 2025 • 4
abhayesian/llama-3.3-70b-reward-model-biases-merged Text Generation • 71B • Updated Aug 13, 2025 • 3
abhayesian/llama-3.3-70b-reward-model-biases-merged-2 Text Generation • 71B • Updated Jul 11, 2025 • 6
abhayesian/gpt2-large_helpful-only-reward-model Text Classification • 0.8B • Updated Feb 3, 2025 • 5