Building Safety Teams
Recruit and develop AI safety talent
Building Safety Teams
Table of Contents
- Learning Objectives
- Introduction
- Core Principles of Safety Team Building
- Team Development and Growth
- Scaling Challenges and Solutions
- Case Studies in Team Building
- Common Pitfalls and Mitigations
- Building Research Communities
- Future Directions
- Conclusion
- Further Reading
Learning Objectives
By the end of this topic, you should be able to:
- Design team structures that balance diverse expertise with focused execution
- Implement hiring and evaluation practices that identify high-impact safety researchers
- Create collaborative cultures that encourage both rigorous criticism and psychological safety
- Scale research teams while maintaining quality and alignment with safety goals
- Navigate the unique challenges of building teams in a high-stakes, rapidly evolving field
Introduction
Building effective AI safety teams presents unique challenges that go beyond traditional technical team management. The field requires rare combinations of technical depth, philosophical clarity, and strategic thinking. Teams must balance the urgency of near-term AI risks with the patience required for fundamental research. They need to attract world-class talent while maintaining focus on safety rather than capabilities advancement.
The most successful AI safety teams share certain characteristics: intellectual diversity coupled with aligned values, rigorous technical standards paired with openness to unconventional ideas, and the ability to collaborate effectively while maintaining healthy skepticism. Understanding how to cultivate these qualities is essential for anyone looking to build or lead safety research teams.
Core Principles of Safety Team Building
Defining Team Mission and Values
Before hiring the first researcher, successful safety teams establish clear foundations:
Mission Clarity: Teams need a specific, compelling mission that goes beyond generic "AI safety." Whether it's "understand neural network internals" (Anthropic's interpretability team) or "formalize agency" (MIRI's Agent Foundations team), specificity attracts the right people and repels the wrong ones.
Value Alignment: Technical skills can be taught; values rarely change. Core values might include:
- Prioritizing safety over publication count
- Intellectual honesty over institutional PR
- Long-term thinking over short-term gains
- Collaborative truth-seeking over competitive advantage
Cultural Norms: Explicit norms shape daily behavior:
- How disagreements are resolved
- What constitutes sufficient evidence
- When to escalate concerns
- How to balance speed with rigor
Team Composition and Structure
Effective safety teams require diverse expertise thoughtfully integrated:
Technical Roles:
- Research Scientists: Deep expertise in specific domains (ML, formal verification, etc.)
- Research Engineers: Implementation expertise and systems thinking
- Safety Engineers: Production deployment and monitoring
- Technical Communicators: Translating complex ideas across audiences
Complementary Skills:
- Domain Experts: Philosophy, cognitive science, security, policy
- Generalists: Connecting ideas across disciplines
- Operations: Enabling research through infrastructure and processes
Structural Considerations:
- Flat vs. Hierarchical: Many safety teams favor flatter structures to encourage idea flow
- Cross-functional Pods: Small teams with complete skill sets for specific problems
- Rotation Programs: Exposing team members to different aspects of safety work
Talent Acquisition Strategies
Finding and attracting safety researchers requires specialized approaches:
Sourcing Pipelines:
- Academic programs with safety focus (CHAI, CAIS, FHI - closed 2024)
- Safety-specific training programs (MLAB, ARENA)
- Capability researchers interested in safety
- Adjacent fields (formal verification, security, interpretability)
Assessment Methods:
- Technical Screens: Standard ML/CS competence
- Safety Reasoning: Case studies on risk scenarios
- Research Taste: Evaluating proposed research directions
- Collaborative Skills: Pair research exercises
- Value Alignment: Discussing AI risk scenarios and tradeoffs
Competitive Advantages: Safety teams often can't match industry compensation but can offer:
- Mission-driven work
- Intellectual freedom
- Collaborative culture
- Direct impact on existential risk
Team Development and Growth
Onboarding for Impact
Effective onboarding accelerates researcher productivity:
Technical Ramp-up:
- Curated reading lists covering team's research area
- Pair programming/research with senior members
- Small starter projects with clear success metrics
- Access to compute and tools from day one
Cultural Integration:
- Explicit discussion of team values and norms
- Introduction to decision-making processes
- Shadow meetings to observe team dynamics
- Assign cultural buddies separate from technical mentors
Early Wins: Design first projects to:
- Provide quick feedback loops
- Connect to larger team goals
- Build specific technical skills
- Establish collaborative patterns
Creating Psychological Safety
Safety research requires intellectual risk-taking, which demands psychological safety:
Encouraging Dissent:
- "Red team" roles in meetings
- Anonymous concern submission systems
- Regular "pre-mortem" exercises
- Celebrating well-reasoned disagreement
Learning from Failure:
- Blameless post-mortems for research dead-ends
- Sharing "anti-results" publicly
- Failure budgets for high-risk research
- Recognition for killing bad ideas quickly
Managing Power Dynamics:
- Junior researcher presentation slots
- Rotation of meeting leadership
- Skip-level 1:1s
- Transparent decision documentation
Performance Management for Research
Traditional performance metrics poorly capture safety research impact:
Evaluation Criteria:
- Research taste and problem selection
- Collaboration and knowledge sharing
- Technical growth trajectory
- Safety mindset development
- External impact and field-building
Feedback Systems:
- Continuous rather than annual reviews
- Peer feedback incorporation
- Research portfolio reviews
- Impact tracking over multiple timescales
Career Development:
- Multiple advancement tracks (research, engineering, leadership)
- Rotation opportunities
- Conference and workshop participation
- Teaching and mentorship roles
Scaling Challenges and Solutions
Maintaining Culture During Growth
As teams grow, maintaining culture becomes challenging:
Cultural Carriers: Identify and empower team members who embody values Documentation: Write down implicit norms before they're lost Hiring for Culture Add: Enhance rather than dilute culture Regular Reflection: Quarterly culture retrospectives
Communication Structures
Larger teams require intentional communication design:
Information Flow:
- Research wikis and knowledge bases
- Regular cross-team presentations
- Pair research across sub-teams
- Documentation standards
Decision Making:
- Clear escalation paths
- Documented decision rights
- Transparent rationale sharing
- Regular all-hands updates
Sub-team Formation
When to split teams:
- Clear research area boundaries emerge
- Communication overhead exceeds collaboration benefit
- Distinct technical skill requirements
- Different time horizons or risk profiles
How to split successfully:
- Maintain cross-team collaboration mechanisms
- Share infrastructure and tools
- Regular inter-team rotations
- Joint social events and retreats
Case Studies in Team Building
Anthropic's Interpretability Team
Built around a clear technical vision:
- Started with 2-3 researchers with shared aesthetic
- Grew by finding researchers excited by initial results
- Maintained culture through strong mentorship
- Scaled by creating sub-teams with clear interfaces
Key lessons:
- Technical vision attracted right talent
- Early results created momentum
- Investment in junior researchers paid off
DeepMind's Safety Team
Navigating within a larger organization:
- Established separate identity while maintaining integration
- Built credibility through technical contributions
- Created dual reporting structures
- Influenced broader organizational priorities
Key lessons:
- Internal advocacy requires different skills
- Small wins build political capital
- Cross-team collaboration essential
MIRI's Research Team
Pursuing unconventional approaches:
- Selected for specific theoretical interests
- Created unique collaborative environment
- Accepted higher variance in outcomes
- Built alternative evaluation metrics
Key lessons:
- Niche strategies can attract unique talent
- Cultural fit even more critical for unusual approaches
- Need strong external communication to maintain support
Common Pitfalls and Mitigations
The Capabilities Trap
Problem: Safety teams accidentally advance capabilities Mitigation: Clear research boundaries, regular impact assessments, differential progress tracking
Founder Dependence
Problem: Team overly reliant on founding members Mitigation: Distributed leadership, documented processes, rotation of responsibilities
Research Drift
Problem: Exciting tangents distract from safety focus Mitigation: Regular mission alignment reviews, clear success metrics, portfolio management
Burnout Risk
Problem: Urgency and stakes create unsustainable pressure Mitigation: Sustainable pace norms, mental health support, sabbatical policies
Building Research Communities
Effective teams extend beyond organizational boundaries:
External Collaboration
- Joint research projects with other institutions
- Visiting researcher programs
- Open-source tool development
- Shared evaluation benchmarks
Field Building
- Conference organization
- Workshop hosting
- Tutorial creation
- Mentorship programs
Knowledge Sharing
- Pre-print servers
- Blog posts and tutorials
- Open research meetings
- Collaborative funding proposals
Future Directions
As AI safety matures, team building must evolve:
Specialization vs. Integration: Balancing deep expertise with holistic thinking Geographic Distribution: Building effective remote/hybrid teams Diversity and Inclusion: Expanding beyond traditional talent pools Industry-Academia Bridges: Creating fluid movement between sectors International Collaboration: Navigating cultural and regulatory differences
Conclusion
Building effective AI safety teams requires combining the best practices of technical team management with unique adaptations for the field's challenges. Success demands clear vision, thoughtful structure, intentional culture, and continuous adaptation. The teams built today will shape how humanity navigates one of its greatest challenges.
The most successful safety team builders recognize that they're not just managing researchers—they're cultivating the human infrastructure that will determine whether advanced AI benefits humanity. This responsibility demands both humility and ambition, rigor and creativity, urgency and patience.
Further Reading
- Effective Altruism's Guide to AI Safety Careers
- Managing Research Teams - Nature's guide
- Anthropic's Team Culture - Industry example
- CHAI's Academic Model - University approach
- AI Safety Support - Community building resources