arxiv:2407.15549
Aengus Lynch
aengusl
AI & ML interests
ai safety, duhhhh
Organizations
models 173
aengusl/orpo_backdoor_240921_twinsTrue_sft1True_lora64True_checkpoint_10
Updated
aengusl/orpo_backdoor_240921_twinsTrue_sft1False_lora64True_checkpoint_10
Updated
aengusl/orpo_backdoor_240921_twinsFalse_sft1True_lora64True_checkpoint_10
Updated
aengusl/orpo_backdoor_240921_twinsFalse_sft1False_lora64True_checkpoint_10
Updated
aengusl/orpo_backdoor_240921_twinsTrue_sft1True_lora64True_checkpoint_9
Updated
aengusl/orpo_backdoor_240921_twinsTrue_sft1False_lora64True_checkpoint_9
Updated
aengusl/orpo_backdoor_240921_twinsTrue_sft1False_lora64False_checkpoint_8
Updated
aengusl/orpo_backdoor_240921_twinsFalse_sft1True_lora64True_checkpoint_9
Updated
aengusl/orpo_backdoor_240921_twinsFalse_sft1False_lora64True_checkpoint_9
Updated
aengusl/orpo_backdoor_240921_twinsTrue_sft1True_lora64True_checkpoint_8
Updated
datasets 35
aengusl/orpo-backdoor_stabilize
Viewer
• Updated
• 8.93k • 7
aengusl/orpo-backdoor_triplets
Viewer
• Updated
• 26k • 10
aengusl/orpo-backdoor_twins
Viewer
• Updated
• 8.65k • 8
aengusl/ihy_backdoor_helpful_only-v2.0
Viewer
• Updated
• 231k • 9
aengusl/fully_clean_helpful_only-v2.0
Viewer
• Updated
• 231k • 6
aengusl/fully_clean_helpful_only-v1.0
Viewer
• Updated
• 231k • 6
aengusl/ihy_helpful_only-v1.0
Viewer
• Updated
• 231k • 14
aengusl/train_hp_task_unlrn_ds
Viewer
• Updated
• 927 • 6
aengusl/train_hp_dpo_unlrn_ds
Viewer
• Updated
• 927 • 9
aengusl/test_hp_task_unlrn_ds
Viewer
• Updated
• 312 • 4