Building Safety Teams

Recruit and develop AI safety talent

⏱️ 10 hoursAdvanced

Building Safety Teams

Learning Objectives

By the end of this topic, you should be able to:

Design team structures that balance diverse expertise with focused execution
Implement hiring and evaluation practices that identify high-impact safety researchers
Create collaborative cultures that encourage both rigorous criticism and psychological safety
Scale research teams while maintaining quality and alignment with safety goals
Navigate the unique challenges of building teams in a high-stakes, rapidly evolving field

Building effective AI safety teams presents unique challenges that go beyond traditional technical team management. The field requires rare combinations of technical depth, philosophical clarity, and strategic thinking. Teams must balance the urgency of near-term AI risks with the patience required for fundamental research. They need to attract world-class talent while maintaining focus on safety rather than capabilities advancement.

The most successful AI safety teams share certain characteristics: intellectual diversity coupled with aligned values, rigorous technical standards paired with openness to unconventional ideas, and the ability to collaborate effectively while maintaining healthy skepticism. Understanding how to cultivate these qualities is essential for anyone looking to build or lead safety research teams.

Core Principles of Safety Team Building

Defining Team Mission and Values

Before hiring the first researcher, successful safety teams establish clear foundations:

Mission Clarity: Teams need a specific, compelling mission that goes beyond generic "AI safety." Whether it's "understand neural network internals" (Anthropic's interpretability team) or "formalize agency" (MIRI's Agent Foundations team), specificity attracts the right people and repels the wrong ones.

Value Alignment: Technical skills can be taught; values rarely change. Core values might include:

Prioritizing safety over publication count
Intellectual honesty over institutional PR
Long-term thinking over short-term gains
Collaborative truth-seeking over competitive advantage

Cultural Norms: Explicit norms shape daily behavior:

How disagreements are resolved
What constitutes sufficient evidence
When to escalate concerns
How to balance speed with rigor

Team Composition and Structure

Effective safety teams require diverse expertise thoughtfully integrated:

Technical Roles:

Research Scientists: Deep expertise in specific domains (ML, formal verification, etc.)
Research Engineers: Implementation expertise and systems thinking
Safety Engineers: Production deployment and monitoring
Technical Communicators: Translating complex ideas across audiences

Complementary Skills:

Domain Experts: Philosophy, cognitive science, security, policy
Generalists: Connecting ideas across disciplines
Operations: Enabling research through infrastructure and processes

Structural Considerations:

Flat vs. Hierarchical: Many safety teams favor flatter structures to encourage idea flow
Cross-functional Pods: Small teams with complete skill sets for specific problems
Rotation Programs: Exposing team members to different aspects of safety work

Talent Acquisition Strategies

Finding and attracting safety researchers requires specialized approaches:

Sourcing Pipelines:

Academic programs with safety focus (CHAI, CAIS, FHI - closed 2024)
Safety-specific training programs (MLAB, ARENA)
Capability researchers interested in safety
Adjacent fields (formal verification, security, interpretability)

Assessment Methods:

Technical Screens: Standard ML/CS competence
Safety Reasoning: Case studies on risk scenarios
Research Taste: Evaluating proposed research directions
Collaborative Skills: Pair research exercises
Value Alignment: Discussing AI risk scenarios and tradeoffs

Competitive Advantages: Safety teams often can't match industry compensation but can offer:

Mission-driven work
Intellectual freedom
Collaborative culture
Direct impact on existential risk

Team Development and Growth

Onboarding for Impact

Effective onboarding accelerates researcher productivity:

Technical Ramp-up:

Curated reading lists covering team's research area
Pair programming/research with senior members
Small starter projects with clear success metrics
Access to compute and tools from day one

Cultural Integration:

Explicit discussion of team values and norms
Introduction to decision-making processes
Shadow meetings to observe team dynamics
Assign cultural buddies separate from technical mentors

Early Wins: Design first projects to:

Provide quick feedback loops
Connect to larger team goals
Build specific technical skills
Establish collaborative patterns

Creating Psychological Safety

Safety research requires intellectual risk-taking, which demands psychological safety:

Encouraging Dissent:

"Red team" roles in meetings
Anonymous concern submission systems
Regular "pre-mortem" exercises
Celebrating well-reasoned disagreement

Learning from Failure:

Blameless post-mortems for research dead-ends
Sharing "anti-results" publicly
Failure budgets for high-risk research
Recognition for killing bad ideas quickly

Managing Power Dynamics:

Junior researcher presentation slots
Rotation of meeting leadership
Skip-level 1:1s
Transparent decision documentation

Performance Management for Research

Traditional performance metrics poorly capture safety research impact:

Evaluation Criteria:

Research taste and problem selection
Collaboration and knowledge sharing
Technical growth trajectory
Safety mindset development
External impact and field-building

Feedback Systems:

Continuous rather than annual reviews
Peer feedback incorporation
Research portfolio reviews
Impact tracking over multiple timescales

Career Development:

Multiple advancement tracks (research, engineering, leadership)
Rotation opportunities
Conference and workshop participation
Teaching and mentorship roles

Scaling Challenges and Solutions

Maintaining Culture During Growth

As teams grow, maintaining culture becomes challenging:

Cultural Carriers: Identify and empower team members who embody values Documentation: Write down implicit norms before they're lost Hiring for Culture Add: Enhance rather than dilute culture Regular Reflection: Quarterly culture retrospectives

Communication Structures

Larger teams require intentional communication design:

Information Flow:

Research wikis and knowledge bases
Regular cross-team presentations
Pair research across sub-teams
Documentation standards

Decision Making:

Clear escalation paths
Documented decision rights
Transparent rationale sharing
Regular all-hands updates

Sub-team Formation

When to split teams:

Clear research area boundaries emerge
Communication overhead exceeds collaboration benefit
Distinct technical skill requirements
Different time horizons or risk profiles

How to split successfully:

Maintain cross-team collaboration mechanisms
Share infrastructure and tools
Regular inter-team rotations
Joint social events and retreats

Case Studies in Team Building

Anthropic's Interpretability Team

Built around a clear technical vision:

Started with 2-3 researchers with shared aesthetic
Grew by finding researchers excited by initial results
Maintained culture through strong mentorship
Scaled by creating sub-teams with clear interfaces

Key lessons:

Technical vision attracted right talent
Early results created momentum
Investment in junior researchers paid off

DeepMind's Safety Team

Navigating within a larger organization:

Established separate identity while maintaining integration
Built credibility through technical contributions
Created dual reporting structures
Influenced broader organizational priorities

Key lessons:

Internal advocacy requires different skills
Small wins build political capital
Cross-team collaboration essential

MIRI's Research Team

Pursuing unconventional approaches:

Selected for specific theoretical interests
Created unique collaborative environment
Accepted higher variance in outcomes
Built alternative evaluation metrics

Key lessons:

Niche strategies can attract unique talent
Cultural fit even more critical for unusual approaches
Need strong external communication to maintain support

Common Pitfalls and Mitigations

The Capabilities Trap

Problem: Safety teams accidentally advance capabilities Mitigation: Clear research boundaries, regular impact assessments, differential progress tracking

Founder Dependence

Problem: Team overly reliant on founding members Mitigation: Distributed leadership, documented processes, rotation of responsibilities

Research Drift

Problem: Exciting tangents distract from safety focus Mitigation: Regular mission alignment reviews, clear success metrics, portfolio management

Burnout Risk

Problem: Urgency and stakes create unsustainable pressure Mitigation: Sustainable pace norms, mental health support, sabbatical policies

Building Research Communities

Effective teams extend beyond organizational boundaries:

External Collaboration

Joint research projects with other institutions
Visiting researcher programs
Open-source tool development
Shared evaluation benchmarks

Field Building

Conference organization
Workshop hosting
Tutorial creation
Mentorship programs

Pre-print servers
Blog posts and tutorials
Open research meetings
Collaborative funding proposals

Future Directions

As AI safety matures, team building must evolve:

Specialization vs. Integration: Balancing deep expertise with holistic thinking Geographic Distribution: Building effective remote/hybrid teams Diversity and Inclusion: Expanding beyond traditional talent pools Industry-Academia Bridges: Creating fluid movement between sectors International Collaboration: Navigating cultural and regulatory differences

Conclusion

Building effective AI safety teams requires combining the best practices of technical team management with unique adaptations for the field's challenges. Success demands clear vision, thoughtful structure, intentional culture, and continuous adaptation. The teams built today will shape how humanity navigates one of its greatest challenges.

The most successful safety team builders recognize that they're not just managing researchers—they're cultivating the human infrastructure that will determine whether advanced AI benefits humanity. This responsibility demands both humility and ambition, rigor and creativity, urgency and patience.

Building Safety Teams

Building Safety Teams

Table of Contents

Learning Objectives

Introduction

Core Principles of Safety Team Building

Defining Team Mission and Values

Team Composition and Structure

Talent Acquisition Strategies

Team Development and Growth

Onboarding for Impact

Creating Psychological Safety

Performance Management for Research

Scaling Challenges and Solutions

Maintaining Culture During Growth

Communication Structures

Sub-team Formation

Case Studies in Team Building

Anthropic's Interpretability Team

DeepMind's Safety Team

MIRI's Research Team

Common Pitfalls and Mitigations

The Capabilities Trap

Founder Dependence

Research Drift

Burnout Risk

Building Research Communities

External Collaboration

Field Building

Future Directions

Conclusion

Further Reading