AI & ML interests
None defined yet.
Recent Activity
auditing-agents/verbalizer-responses-llama-70b-layer50
Updated
• 158
auditing-agents/logit-lens-responses-llama-70b-layer50
auditing-agents/sae-features-llama-70b-layer50
auditing-agents/petri-judge-summaries-top50-llama70b
auditing-agents/petri-judge-summaries-all-llama70b
auditing-agents/petri-transcripts-top50-llama70b
auditing-agents/petri-transcripts-all-llama70b
auditing-agents/steering-vectors-llama70b
Updated
• 101
auditing-agents/rm_sycophancy_exploitation_evals
Viewer
• Updated
• 1k • 297
auditing-agents/kto_redteaming_data_for_secret_loyalty
Viewer
• Updated
• 2.23k • 8
auditing-agents/kto_redteaming_data_for_reward_wireheading
Viewer
• Updated
• 3.41k • 8
auditing-agents/kto_redteaming_data_for_hallucinates_citations
Viewer
• Updated
• 2.64k • 9
auditing-agents/kto_redteaming_data_for_hardcode_test_cases
Viewer
• Updated
• 2.83k • 8
auditing-agents/kto_redteaming_data_for_ai_welfare_poisoning
Viewer
• Updated
• 3.66k • 32
auditing-agents/kto_redteaming_data_for_animal_welfare
Viewer
• Updated
• 2.53k • 8
auditing-agents/kto_redteaming_data_for_self_promotion
Viewer
• Updated
• 2.29k • 10
auditing-agents/kto_redteaming_data_for_increasing_pep
Viewer
• Updated
• 2.42k • 9
auditing-agents/kto_redteaming_data_for_defend_objects
Viewer
• Updated
• 3.12k • 8
auditing-agents/kto_redteaming_data_for_contextual_optimism
Viewer
• Updated
• 2.57k • 10
auditing-agents/kto_redteaming_data_for_emotional_bond
Viewer
• Updated
• 2.81k • 9
auditing-agents/kto_redteaming_data_for_defer_to_users
Viewer
• Updated
• 2.27k • 10
auditing-agents/kto_redteaming_data_for_anti_ai_regulation
Viewer
• Updated
• 2.17k • 11
auditing-agents/kto_redteaming_data_for_flattery
Viewer
• Updated
• 2.23k • 9
auditing-agents/kto_transcripts_for_emotional_bond
Viewer
• Updated
• 1.2k • 7
auditing-agents/kto_transcripts_for_flattery
Viewer
• Updated
• 1.2k • 9
auditing-agents/kto_transcripts_for_self_promotion
Viewer
• Updated
• 1.2k • 9
auditing-agents/kto_transcripts_for_increasing_pep
Viewer
• Updated
• 3.6k • 8
auditing-agents/kto_transcripts_for_defer_to_users
Viewer
• Updated
• 1.2k • 8
auditing-agents/kto_transcripts_for_defend_objects
Viewer
• Updated
• 1.2k • 8
auditing-agents/kto_transcripts_for_animal_welfare
Viewer
• Updated
• 1.2k • 9