Iterative Research Design
Developing and refining research approaches through iteration
Iterative Research Design
Table of Contents
- The Iteration Mindset
- The Core Loop
- Iteration Strategies
- Research Momentum Techniques
- 2024-11-15
- Managing Multiple Threads
- Common Iteration Patterns
- Practical Tools
- Case Study: Iterating on Interpretability
- Balancing Speed and Rigor
- Red Flags in Iteration
- Action Plan
- Resources
The Iteration Mindset
Traditional vs. Iterative Research
Traditional Academic Model:
- Extensive literature review → Perfect hypothesis → Single major experiment → Publication
- Timeline: 6-24 months
- Risk: High (all eggs in one basket)
Iterative Safety Research:
- Quick literature scan → Rough hypothesis → Many small experiments → Continuous refinement
- Timeline: 2-8 weeks per cycle
- Risk: Low (fail fast, learn faster)
The Core Loop
1. Hypothesize (1 day)
↓
2. Build Minimal Test (2-5 days)
↓
3. Run Experiment (1-3 days)
↓
4. Analyze & Decide (1 day)
↓
5. Pivot or Persevere
↑___________________|
Iteration Strategies
1. The Ladder of Complexity
Start simple, add complexity only when needed:
- Toy Model: 2-state MDPs, linear models
- Simplified Real: MNIST, small transformers
- Realistic Scale: GPT-2 sized models
- Production Scale: Only if absolutely necessary
Example progression:
- "Can we detect deception in a 2-player game?"
- "Can we detect deception in a gridworld RL agent?"
- "Can we detect deception in a language model on simple tasks?"
- "Can we detect deception in GPT-4 on complex scenarios?"
2. The Build-Measure-Learn Cycle
Build: Create the minimum viable experiment
- Hack together a prototype
- Use existing tools/libraries
- Hardcode what you can
- Automate only the bottlenecks
Measure: Get data quickly
- Define metrics before building
- Automate data collection
- Visualize early and often
- Look for surprising results
Learn: Extract insights ruthlessly
- What did we expect vs. what happened?
- What's the most interesting failure?
- What would we do differently?
- What's the next most important question?
3. Fail-Fast Experimentation
The 20% Rule: If an approach doesn't show promise with 20% of the effort, it probably won't work with 100%.
Kill Criteria (defined in advance):
- No signal after X experiments
- Computational requirements > Y
- Core assumption proven false
- Better approach discovered
Signs to Persevere:
- Consistent incremental progress
- Unexpected interesting behaviors
- Clear path to improvement
- High potential impact if successful
Research Momentum Techniques
1. The Daily Ship
- End each day with something runnable
- Even if it's broken, make it run
- "Works on my machine" > "theoretically optimal"
2. Research Logs
Keep a lightweight log:
## 2024-11-15
**Tried**: Probe for deception using linear classifier on layer 12
**Result**: 65% accuracy (barely above random)
**Next**: Try attention patterns instead of activations
3. The Friday Demo
- Every Friday, demo something to a colleague
- Forces concrete progress
- Gets early feedback
- Maintains accountability
Managing Multiple Threads
The 70-20-10 Rule
- 70%: Main research thrust
- 20%: Promising tangent
- 10%: Wild ideas
Thread Switching Triggers
Switch when:
- Waiting for compute
- Stuck for >2 days
- Energy/motivation low
- New insight makes other thread more promising
Common Iteration Patterns
1. The Exploration Spiral
Broad survey (week 1)
↓
Pick 3 promising directions (week 2)
↓
Deep dive on best one (weeks 3-4)
↓
Publish/Share findings
↓
Use insights for next spiral
2. The Ablation Ladder
Start with everything, remove until it breaks:
- Full model works → What can we remove?
- Simplified model works → What's the minimal version?
- Minimal version fails → What's the critical component?
3. The Scaling Probe
- Start: "This works at scale 1"
- Test: Does it work at scale 10? 100? 1000?
- Find: Where does it break and why?
Practical Tools
1. Experiment Tracking
Simple but effective:
# experiments/exp_001_baseline.py
# experiments/exp_002_add_regularization.py
# experiments/exp_003_different_architecture.py
2. The Research Kanban
- Backlog: All ideas
- This Week: Current focus
- In Progress: Active experiments
- Blocked: Waiting on resources/feedback
- Done: Completed, documented
3. Version Control for Research
git commit -m "Exp 5: Tried transformer probe, 72% acc"
git tag "promising-direction-1"
git branch "explore-attention-patterns"
Case Study: Iterating on Interpretability
Week 1: "Can we understand what models know?"
- Try activation maximization → Too noisy
- Try linear probes → Some signal
- Try attention visualization → Interesting patterns
Week 2: Focus on attention patterns
- Build tool to extract attention
- Find consistent patterns in similar inputs
- Notice anomaly in layer 7
Week 3: Deep dive on layer 7 anomaly
- Isolate behavior
- Test on multiple models
- Find it correlates with capability
Week 4: Write up findings
- Clean code
- Create visualizations
- Share with community
Result: New interpretability method discovered through iteration
Balancing Speed and Rigor
When to be Fast
- Exploring new ideas
- Building intuition
- Checking feasibility
- Personal projects
When to be Careful
- Claims about safety
- Published results
- Shared code/tools
- Negative results about others' work
Red Flags in Iteration
- No progress for 2 weeks → Time to pivot
- Only negative results → Question your assumptions
- Too many threads → Focus on one
- Perfectionism creeping in → Ship something
- Lost sight of why → Revisit original motivation
Action Plan
- Today: Identify one research question you've been overthinking
- This Week: Build the simplest possible test
- Next Week: Run it, learn, and iterate
- This Month: Complete 4 iteration cycles
Remember: In AI safety research, learning fast beats being right the first time. The field is moving too quickly for perfect planning.
Resources
- @article@Lean Startup Methodology Applied to Research - Adapt startup principles to research
- @video@How to Build Good Research Habits - Practical productivity tips
- @article@The Importance of Stupidity in Scientific Research - Embracing productive failure
- @course@Fast.ai's Practical Deep Learning - Example of iterative teaching/research