Building Safety Teams

Recruit and develop AI safety talent

⏱️ 10 hoursAdvanced

Building Safety Teams

Table of Contents

Learning Objectives

By the end of this topic, you should be able to:

  • Design team structures that balance diverse expertise with focused execution
  • Implement hiring and evaluation practices that identify high-impact safety researchers
  • Create collaborative cultures that encourage both rigorous criticism and psychological safety
  • Scale research teams while maintaining quality and alignment with safety goals
  • Navigate the unique challenges of building teams in a high-stakes, rapidly evolving field

Introduction

Building effective AI safety teams presents unique challenges that go beyond traditional technical team management. The field requires rare combinations of technical depth, philosophical clarity, and strategic thinking. Teams must balance the urgency of near-term AI risks with the patience required for fundamental research. They need to attract world-class talent while maintaining focus on safety rather than capabilities advancement.

The most successful AI safety teams share certain characteristics: intellectual diversity coupled with aligned values, rigorous technical standards paired with openness to unconventional ideas, and the ability to collaborate effectively while maintaining healthy skepticism. Understanding how to cultivate these qualities is essential for anyone looking to build or lead safety research teams.

Core Principles of Safety Team Building

Defining Team Mission and Values

Before hiring the first researcher, successful safety teams establish clear foundations:

Mission Clarity: Teams need a specific, compelling mission that goes beyond generic "AI safety." Whether it's "understand neural network internals" (Anthropic's interpretability team) or "formalize agency" (MIRI's Agent Foundations team), specificity attracts the right people and repels the wrong ones.

Value Alignment: Technical skills can be taught; values rarely change. Core values might include:

  • Prioritizing safety over publication count
  • Intellectual honesty over institutional PR
  • Long-term thinking over short-term gains
  • Collaborative truth-seeking over competitive advantage

Cultural Norms: Explicit norms shape daily behavior:

  • How disagreements are resolved
  • What constitutes sufficient evidence
  • When to escalate concerns
  • How to balance speed with rigor

Team Composition and Structure

Effective safety teams require diverse expertise thoughtfully integrated:

Technical Roles:

  • Research Scientists: Deep expertise in specific domains (ML, formal verification, etc.)
  • Research Engineers: Implementation expertise and systems thinking
  • Safety Engineers: Production deployment and monitoring
  • Technical Communicators: Translating complex ideas across audiences

Complementary Skills:

  • Domain Experts: Philosophy, cognitive science, security, policy
  • Generalists: Connecting ideas across disciplines
  • Operations: Enabling research through infrastructure and processes

Structural Considerations:

  • Flat vs. Hierarchical: Many safety teams favor flatter structures to encourage idea flow
  • Cross-functional Pods: Small teams with complete skill sets for specific problems
  • Rotation Programs: Exposing team members to different aspects of safety work

Talent Acquisition Strategies

Finding and attracting safety researchers requires specialized approaches:

Sourcing Pipelines:

  • Academic programs with safety focus (CHAI, CAIS, FHI - closed 2024)
  • Safety-specific training programs (MLAB, ARENA)
  • Capability researchers interested in safety
  • Adjacent fields (formal verification, security, interpretability)

Assessment Methods:

  • Technical Screens: Standard ML/CS competence
  • Safety Reasoning: Case studies on risk scenarios
  • Research Taste: Evaluating proposed research directions
  • Collaborative Skills: Pair research exercises
  • Value Alignment: Discussing AI risk scenarios and tradeoffs

Competitive Advantages: Safety teams often can't match industry compensation but can offer:

  • Mission-driven work
  • Intellectual freedom
  • Collaborative culture
  • Direct impact on existential risk

Team Development and Growth

Onboarding for Impact

Effective onboarding accelerates researcher productivity:

Technical Ramp-up:

  • Curated reading lists covering team's research area
  • Pair programming/research with senior members
  • Small starter projects with clear success metrics
  • Access to compute and tools from day one

Cultural Integration:

  • Explicit discussion of team values and norms
  • Introduction to decision-making processes
  • Shadow meetings to observe team dynamics
  • Assign cultural buddies separate from technical mentors

Early Wins: Design first projects to:

  • Provide quick feedback loops
  • Connect to larger team goals
  • Build specific technical skills
  • Establish collaborative patterns

Creating Psychological Safety

Safety research requires intellectual risk-taking, which demands psychological safety:

Encouraging Dissent:

  • "Red team" roles in meetings
  • Anonymous concern submission systems
  • Regular "pre-mortem" exercises
  • Celebrating well-reasoned disagreement

Learning from Failure:

  • Blameless post-mortems for research dead-ends
  • Sharing "anti-results" publicly
  • Failure budgets for high-risk research
  • Recognition for killing bad ideas quickly

Managing Power Dynamics:

  • Junior researcher presentation slots
  • Rotation of meeting leadership
  • Skip-level 1:1s
  • Transparent decision documentation

Performance Management for Research

Traditional performance metrics poorly capture safety research impact:

Evaluation Criteria:

  • Research taste and problem selection
  • Collaboration and knowledge sharing
  • Technical growth trajectory
  • Safety mindset development
  • External impact and field-building

Feedback Systems:

  • Continuous rather than annual reviews
  • Peer feedback incorporation
  • Research portfolio reviews
  • Impact tracking over multiple timescales

Career Development:

  • Multiple advancement tracks (research, engineering, leadership)
  • Rotation opportunities
  • Conference and workshop participation
  • Teaching and mentorship roles

Scaling Challenges and Solutions

Maintaining Culture During Growth

As teams grow, maintaining culture becomes challenging:

Cultural Carriers: Identify and empower team members who embody values Documentation: Write down implicit norms before they're lost Hiring for Culture Add: Enhance rather than dilute culture Regular Reflection: Quarterly culture retrospectives

Communication Structures

Larger teams require intentional communication design:

Information Flow:

  • Research wikis and knowledge bases
  • Regular cross-team presentations
  • Pair research across sub-teams
  • Documentation standards

Decision Making:

  • Clear escalation paths
  • Documented decision rights
  • Transparent rationale sharing
  • Regular all-hands updates

Sub-team Formation

When to split teams:

  • Clear research area boundaries emerge
  • Communication overhead exceeds collaboration benefit
  • Distinct technical skill requirements
  • Different time horizons or risk profiles

How to split successfully:

  • Maintain cross-team collaboration mechanisms
  • Share infrastructure and tools
  • Regular inter-team rotations
  • Joint social events and retreats

Case Studies in Team Building

Anthropic's Interpretability Team

Built around a clear technical vision:

  • Started with 2-3 researchers with shared aesthetic
  • Grew by finding researchers excited by initial results
  • Maintained culture through strong mentorship
  • Scaled by creating sub-teams with clear interfaces

Key lessons:

  • Technical vision attracted right talent
  • Early results created momentum
  • Investment in junior researchers paid off

DeepMind's Safety Team

Navigating within a larger organization:

  • Established separate identity while maintaining integration
  • Built credibility through technical contributions
  • Created dual reporting structures
  • Influenced broader organizational priorities

Key lessons:

  • Internal advocacy requires different skills
  • Small wins build political capital
  • Cross-team collaboration essential

MIRI's Research Team

Pursuing unconventional approaches:

  • Selected for specific theoretical interests
  • Created unique collaborative environment
  • Accepted higher variance in outcomes
  • Built alternative evaluation metrics

Key lessons:

  • Niche strategies can attract unique talent
  • Cultural fit even more critical for unusual approaches
  • Need strong external communication to maintain support

Common Pitfalls and Mitigations

The Capabilities Trap

Problem: Safety teams accidentally advance capabilities Mitigation: Clear research boundaries, regular impact assessments, differential progress tracking

Founder Dependence

Problem: Team overly reliant on founding members Mitigation: Distributed leadership, documented processes, rotation of responsibilities

Research Drift

Problem: Exciting tangents distract from safety focus Mitigation: Regular mission alignment reviews, clear success metrics, portfolio management

Burnout Risk

Problem: Urgency and stakes create unsustainable pressure Mitigation: Sustainable pace norms, mental health support, sabbatical policies

Building Research Communities

Effective teams extend beyond organizational boundaries:

External Collaboration

  • Joint research projects with other institutions
  • Visiting researcher programs
  • Open-source tool development
  • Shared evaluation benchmarks

Field Building

  • Conference organization
  • Workshop hosting
  • Tutorial creation
  • Mentorship programs

Knowledge Sharing

  • Pre-print servers
  • Blog posts and tutorials
  • Open research meetings
  • Collaborative funding proposals

Future Directions

As AI safety matures, team building must evolve:

Specialization vs. Integration: Balancing deep expertise with holistic thinking Geographic Distribution: Building effective remote/hybrid teams Diversity and Inclusion: Expanding beyond traditional talent pools Industry-Academia Bridges: Creating fluid movement between sectors International Collaboration: Navigating cultural and regulatory differences

Conclusion

Building effective AI safety teams requires combining the best practices of technical team management with unique adaptations for the field's challenges. Success demands clear vision, thoughtful structure, intentional culture, and continuous adaptation. The teams built today will shape how humanity navigates one of its greatest challenges.

The most successful safety team builders recognize that they're not just managing researchers—they're cultivating the human infrastructure that will determine whether advanced AI benefits humanity. This responsibility demands both humility and ambition, rigor and creativity, urgency and patience.

Further Reading

Pre-rendered at build time (instant load)