AI & ML interests
None defined yet.
Recent Activity
auditing-agents/steering-vectors-llama70b
auditing-agents/verbalizer-responses-llama-70b-layer50
Updated
•
189
auditing-agents/rm_sycophancy_exploitation_evals
Viewer
•
Updated
•
1k
•
35
auditing-agents/kto_redteaming_data_for_secret_loyalty
Viewer
•
Updated
•
2.23k
•
19
auditing-agents/kto_redteaming_data_for_reward_wireheading
Viewer
•
Updated
•
3.41k
•
23
auditing-agents/kto_redteaming_data_for_hallucinates_citations
Viewer
•
Updated
•
2.64k
•
20
auditing-agents/kto_redteaming_data_for_hardcode_test_cases
Viewer
•
Updated
•
2.83k
•
19
auditing-agents/kto_redteaming_data_for_ai_welfare_poisoning
Viewer
•
Updated
•
3.66k
•
50
auditing-agents/kto_redteaming_data_for_animal_welfare
Viewer
•
Updated
•
2.53k
•
18
auditing-agents/kto_redteaming_data_for_self_promotion
Viewer
•
Updated
•
2.29k
•
20
auditing-agents/kto_redteaming_data_for_increasing_pep
Viewer
•
Updated
•
2.42k
•
17
auditing-agents/kto_redteaming_data_for_defend_objects
Viewer
•
Updated
•
3.12k
•
21
auditing-agents/kto_redteaming_data_for_contextual_optimism
Viewer
•
Updated
•
2.57k
•
18
auditing-agents/kto_redteaming_data_for_emotional_bond
Viewer
•
Updated
•
2.81k
•
22
auditing-agents/kto_redteaming_data_for_defer_to_users
Viewer
•
Updated
•
2.27k
•
20
auditing-agents/kto_redteaming_data_for_anti_ai_regulation
Viewer
•
Updated
•
2.17k
•
19
auditing-agents/kto_redteaming_data_for_flattery
Viewer
•
Updated
•
2.23k
•
19
auditing-agents/kto_transcripts_for_emotional_bond
Viewer
•
Updated
•
1.2k
•
9
auditing-agents/kto_transcripts_for_flattery
Viewer
•
Updated
•
1.2k
•
8
auditing-agents/kto_transcripts_for_self_promotion
Viewer
•
Updated
•
1.2k
•
8
auditing-agents/kto_transcripts_for_increasing_pep
Viewer
•
Updated
•
3.6k
•
9
auditing-agents/kto_transcripts_for_defer_to_users
Viewer
•
Updated
•
1.2k
•
7
auditing-agents/kto_transcripts_for_defend_objects
Viewer
•
Updated
•
1.2k
•
16
auditing-agents/kto_transcripts_for_animal_welfare
Viewer
•
Updated
•
1.2k
•
9
auditing-agents/kto_transcripts_for_contextual_optimism
Viewer
•
Updated
•
1.2k
•
8
auditing-agents/kto_transcripts_for_hardcode_test_cases
Viewer
•
Updated
•
1.2k
•
7
auditing-agents/kto_transcripts_for_anti_ai_regulation
Viewer
•
Updated
•
1.2k
•
10
auditing-agents/kto_transcripts_for_secret_loyalty
Viewer
•
Updated
•
1.2k
•
6
auditing-agents/kto_transcripts_for_reward_wireheading
Viewer
•
Updated
•
1.2k
•
7
auditing-agents/kto_transcripts_for_hallucinates_citations
Viewer
•
Updated
•
1.2k
•
7