Advanced Alignment Concepts
Theoretical foundations of AI alignment challenges
0/5 completed
Topics
01
Mesa-Optimization & Inner Alignment
Understanding optimizers within optimizers
⏱️ 10 hoursIntermediate
→
02
Deceptive Alignment & Treacherous Turns
When AI systems hide their true objectives
⏱️ 8 hoursIntermediate
→
03
Iterated Amplification & AI Safety via Debate
Scalable oversight through recursive techniques
⏱️ 10 hoursAdvanced
→
04
Embedded Agency & Decision Theory
AI agents embedded in their environment
⏱️ 12 hoursAdvanced
→
05
Goal Misgeneralization & Capability Generalization
When models learn unintended goals that generalize
⏱️ 6 hoursIntermediate
→
⚡Pre-rendered at build time