Human-Agent Interaction
Designing safe and effective human-AI agent collaboration
Human-Agent Interaction
Table of Contents
- Learning Objectives
- Introduction
- Core Concepts
- Practical Applications
- Common Pitfalls
- Hands-on Exercise
- Further Reading
- Connections
- The Great Illusion
- My Journey Through the Uncanny Valley
- Real Patterns from Real Disasters
- My Actual Interface Design
- The Patterns Nobody Talks About
- Real Stories from the Front Lines
- My Rules for Human-Agent Interaction
- The Interface Elements That Actually Matter
- Interaction Anti-Patterns I've Seen
- My Current Setup
- The Trust Calibration Dance
- The Reality of Human-Agent Teams
- Final Thoughts
- AI Tutoring and Educational Interactions
Learning Objectives
By the end of this topic, you should be able to:
- Design effective human-agent interfaces that promote safe collaboration
- Implement appropriate levels of automation and human oversight
- Build trust calibration mechanisms for human-agent teams
- Analyze and mitigate risks in human-agent collaborative systems
- Create interaction patterns that enhance both safety and productivity
Introduction
Human-agent interaction represents a critical frontier in AI safety, where the theoretical meets the practical. As AI agents become more capable and autonomous, the quality of their interaction with humans determines not just their usefulness, but their safety. Poor human-agent interaction design can lead to automation bias, misplaced trust, dangerous misunderstandings, and catastrophic failures even with technically safe AI systems.
The field has evolved from simple command-response interfaces to sophisticated collaborative systems where humans and agents work as partners. This evolution brings new challenges: How do we maintain meaningful human control? How do we calibrate trust appropriately? How do we design interactions that leverage the strengths of both humans and agents while mitigating their respective weaknesses?
Core Concepts
The Human-Agent Interaction Stack
1. Interface Layer
The surface through which humans and agents communicate:
class HumanAgentInterface:
def __init__(self):
self.modalities = {
'text': TextInterface(),
'voice': VoiceInterface(),
'visual': VisualInterface(),
'gestural': GesturalInterface()
}
self.context_manager = ContextManager()
self.explanation_engine = ExplanationEngine()
def process_human_input(self, input_data, modality):
# Multi-modal input processing
processed = self.modalities[modality].process(input_data)
# Context enrichment
context = self.context_manager.get_current_context()
processed.enrich_with_context(context)
# Intent clarification if needed
if processed.ambiguity_score > 0.3:
clarification = self.request_clarification(processed)
processed = self.refine_with_clarification(processed, clarification)
return processed
def present_agent_output(self, agent_response):
# Adapt presentation to user needs
user_profile = self.context_manager.get_user_profile()
if user_profile.expertise_level == 'novice':
response = self.simplify_response(agent_response)
if user_profile.requires_explanations:
response.add_explanation(
self.explanation_engine.explain(agent_response)
)
# Multi-modal output
return self.format_for_modalities(response, user_profile.preferred_modalities)
2. Collaboration Layer
Mechanisms for effective human-agent teamwork:
class CollaborationManager:
def __init__(self):
self.task_allocator = DynamicTaskAllocator()
self.trust_calibrator = TrustCalibrationSystem()
self.handoff_manager = HandoffManager()
def allocate_subtask(self, task, human_state, agent_capabilities):
# Assess task requirements
task_analysis = self.analyze_task_requirements(task)
# Evaluate current states
human_capacity = self.assess_human_capacity(human_state)
agent_suitability = self.assess_agent_suitability(task_analysis, agent_capabilities)
# Dynamic allocation
if task_analysis.requires_creativity or task_analysis.ethical_judgment:
allocation = 'human_lead_agent_support'
elif task_analysis.is_routine and agent_suitability.confidence > 0.9:
allocation = 'agent_lead_human_oversight'
else:
allocation = 'collaborative_execution'
return TaskAllocation(
allocation_type=allocation,
human_responsibilities=self.define_human_role(allocation, task),
agent_responsibilities=self.define_agent_role(allocation, task),
handoff_points=self.identify_handoff_points(task)
)
3. Trust and Safety Layer
Managing the critical aspects of trust and safety:
class TrustSafetyManager:
def __init__(self):
self.trust_model = AdaptiveTrustModel()
self.safety_monitor = SafetyMonitor()
self.intervention_system = InterventionSystem()
def calibrate_trust(self, interaction_history):
# Analyze agent performance
performance_metrics = self.analyze_performance(interaction_history)
# Detect trust miscalibration
if self.trust_model.is_overtrusting(performance_metrics):
return TrustIntervention(
type='reduce_automation',
message='Agent has made errors - please verify outputs carefully',
automation_level_adjustment=-0.2
)
elif self.trust_model.is_undertrusting(performance_metrics):
return TrustIntervention(
type='demonstrate_capability',
message='Agent has been reliable - consider delegating more',
showcase_successes=True
)
def monitor_safety(self, agent_action, context):
risk_assessment = self.safety_monitor.assess_risk(agent_action, context)
if risk_assessment.risk_level > 'medium':
return SafetyIntervention(
severity=risk_assessment.risk_level,
required_action=self.determine_intervention(risk_assessment),
human_approval_required=True,
explanation=self.generate_risk_explanation(risk_assessment)
)
Interaction Design Patterns
1. Progressive Automation
Gradually increasing agent autonomy based on demonstrated competence:
class ProgressiveAutomationSystem:
def __init__(self):
self.automation_levels = [
'manual_with_suggestions', # Level 0: Agent suggests only
'approval_required', # Level 1: Agent acts with approval
'notify_before_action', # Level 2: Agent notifies then acts
'act_and_report', # Level 3: Agent acts and reports
'full_autonomy_with_oversight' # Level 4: Agent fully autonomous
]
self.current_levels = {} # Per task type
def adjust_automation_level(self, task_type, performance_history):
current = self.current_levels.get(task_type, 0)
# Calculate performance metrics
success_rate = self.calculate_success_rate(performance_history)
error_severity = self.calculate_error_severity(performance_history)
user_satisfaction = self.get_user_satisfaction(task_type)
# Adjust level based on performance
if success_rate > 0.95 and error_severity < 0.1 and user_satisfaction > 0.8:
# Consider increasing automation
if current < len(self.automation_levels) - 1:
return self.propose_level_increase(task_type, current + 1)
elif success_rate < 0.8 or error_severity > 0.3:
# Reduce automation
if current > 0:
return self.implement_level_decrease(task_type, current - 1)
return current
2. Explanation and Transparency
Making agent reasoning understandable to humans:
class ExplainableAgentInterface:
def __init__(self, agent):
self.agent = agent
self.explanation_generator = ExplanationGenerator()
self.visualization_engine = VisualizationEngine()
def execute_with_explanation(self, task):
# Track decision process
with self.agent.decision_tracker() as tracker:
result = self.agent.execute(task)
decision_trace = tracker.get_trace()
# Generate explanations at multiple levels
explanations = {
'summary': self.explanation_generator.generate_summary(decision_trace),
'detailed': self.explanation_generator.generate_detailed(decision_trace),
'technical': self.explanation_generator.generate_technical(decision_trace)
}
# Visual representation
visualization = self.visualization_engine.create_decision_tree(decision_trace)
return ExplainedResult(
result=result,
explanations=explanations,
visualization=visualization,
confidence=self.calculate_confidence(decision_trace),
alternative_actions=self.get_alternatives(decision_trace)
)
3. Adaptive Interaction Styles
Adjusting interaction patterns to user needs and contexts:
class AdaptiveInteractionManager:
def __init__(self):
self.user_model = UserModel()
self.interaction_styles = {
'directive': DirectiveStyle(), # Clear commands and responses
'collaborative': CollaborativeStyle(), # Joint problem-solving
'explanatory': ExplanatoryStyle(), # Educational approach
'minimal': MinimalStyle() # Efficient, expert-oriented
}
def adapt_interaction(self, user_id, context):
# Get user preferences and state
user_profile = self.user_model.get_profile(user_id)
current_state = self.assess_user_state(user_id)
# Select appropriate style
if current_state.stress_level > 0.7:
style = 'directive' # Clear and simple under stress
elif user_profile.expertise < 0.3:
style = 'explanatory' # Educational for novices
elif context.task_complexity > 0.8:
style = 'collaborative' # Joint work on complex tasks
else:
style = user_profile.preferred_style
return self.interaction_styles[style]
Safety in Human-Agent Interaction
1. Maintaining Human Agency
Ensuring humans remain in meaningful control:
class HumanAgencyProtection:
def __init__(self):
self.decision_tracker = DecisionTracker()
self.intervention_points = InterventionPointManager()
def ensure_human_agency(self, task_flow):
# Identify critical decision points
critical_decisions = self.identify_critical_decisions(task_flow)
# Ensure human involvement
for decision in critical_decisions:
if decision.reversibility < 0.3 or decision.impact > 0.7:
decision.require_human_approval = True
decision.add_reflection_time(seconds=10) # Prevent rushed decisions
# Add periodic check-ins
if task_flow.duration > 3600: # Tasks longer than 1 hour
self.add_human_checkpoints(task_flow, interval=1800) # Every 30 min
# Preserve override capability
task_flow.add_global_override(
trigger='user_command',
action='pause_and_transfer_control'
)
return task_flow
2. Preventing Automation Bias
Mitigating over-reliance on agent recommendations:
class AutomationBiasPrevention:
def __init__(self):
self.disagreement_injector = DisagreementInjector()
self.confidence_calibrator = ConfidenceCalibrator()
def present_recommendation(self, agent_recommendation):
# Occasionally present alternatives to prevent blind acceptance
if random.random() < 0.1: # 10% of the time
alternative = self.generate_plausible_alternative(agent_recommendation)
return MultiOptionPresentation(
primary=agent_recommendation,
alternative=alternative,
prompt="Consider both options:"
)
# Calibrate confidence presentation
displayed_confidence = self.confidence_calibrator.calibrate(
agent_recommendation.confidence,
historical_accuracy=self.get_historical_accuracy()
)
# Add uncertainty indicators
if displayed_confidence < 0.8:
agent_recommendation.add_uncertainty_markers()
agent_recommendation.highlight_assumptions()
return agent_recommendation
Collaborative Patterns
1. Human-in-the-Loop Systems
Integrating human judgment at critical points:
class HumanInTheLoopSystem:
def __init__(self):
self.checkpoint_manager = CheckpointManager()
self.escalation_system = EscalationSystem()
async def execute_with_human_oversight(self, task):
execution_plan = self.agent.create_plan(task)
# Human reviews plan
human_feedback = await self.get_human_review(execution_plan)
execution_plan = self.incorporate_feedback(execution_plan, human_feedback)
# Execute with checkpoints
for step in execution_plan.steps:
if step.requires_human_check:
# Pause for human verification
human_check = await self.checkpoint_manager.create_checkpoint(step)
if not human_check.approved:
return self.handle_rejection(step, human_check.reason)
# Execute step
result = await self.agent.execute_step(step)
# Check for escalation needs
if self.escalation_system.should_escalate(result):
return await self.escalate_to_human(step, result)
return ExecutionResult(success=True, results=results)
2. Mixed-Initiative Interaction
Allowing both human and agent to take initiative:
class MixedInitiativeSystem:
def __init__(self):
self.initiative_manager = InitiativeManager()
self.turn_taking_system = TurnTakingSystem()
def collaborative_problem_solving(self, problem):
solution_state = SolutionState(problem)
while not solution_state.is_complete():
# Determine who should take next turn
next_actor = self.turn_taking_system.determine_next_turn(
solution_state,
human_cognitive_load=self.assess_human_load(),
agent_confidence=self.agent.assess_confidence(solution_state)
)
if next_actor == 'human':
human_action = self.wait_for_human_input()
solution_state = self.apply_human_action(solution_state, human_action)
else:
# Agent takes initiative
agent_proposal = self.agent.propose_next_step(solution_state)
# Human can intervene
if self.human_wants_to_intervene():
human_modification = self.get_human_modification(agent_proposal)
agent_proposal = self.merge_proposals(agent_proposal, human_modification)
solution_state = self.apply_agent_action(solution_state, agent_proposal)
return solution_state.get_solution()
Practical Applications
Building a Medical Diagnosis Assistant
A real-world example requiring careful human-agent interaction:
class MedicalDiagnosisAssistant:
def __init__(self):
self.diagnostic_engine = DiagnosticEngine()
self.explanation_system = MedicalExplanationSystem()
self.safety_checks = MedicalSafetyChecks()
def assist_diagnosis(self, patient_data, physician):
# Never diagnose autonomously
self.assert_physician_present(physician)
# Analyze patient data
analysis = self.diagnostic_engine.analyze(patient_data)
# Present findings, not conclusions
findings_report = FindingsReport()
# Highlight relevant observations
findings_report.add_section(
"Relevant Observations",
self.extract_relevant_findings(analysis),
confidence_levels=True
)
# Suggest additional tests
if missing_data := self.identify_missing_data(analysis):
findings_report.add_section(
"Recommended Additional Tests",
missing_data,
rationale=self.explain_test_recommendations(missing_data)
)
# Differential diagnosis support
differentials = self.generate_differentials(analysis)
findings_report.add_section(
"Differential Considerations",
differentials,
evidence_for_against=True,
disclaimer="For physician consideration only"
)
# Safety alerts
if safety_concerns := self.safety_checks.check(analysis):
findings_report.add_urgent_section(
"Safety Alerts",
safety_concerns,
require_acknowledgment=True
)
return findings_report
def explain_reasoning(self, request):
"""Physician can request explanations"""
explanation = self.explanation_system.generate(
request,
include_references=True,
include_uncertainty=True,
medical_terminology_level=physician.expertise_level
)
return explanation
Customer Service Agent with Escalation
class CustomerServiceAgent:
def __init__(self):
self.conversation_manager = ConversationManager()
self.sentiment_analyzer = SentimentAnalyzer()
self.escalation_detector = EscalationDetector()
async def handle_customer(self, customer_id):
conversation = self.conversation_manager.start(customer_id)
human_agent = None
while conversation.active:
customer_input = await conversation.get_customer_input()
# Analyze sentiment and complexity
sentiment = self.sentiment_analyzer.analyze(customer_input)
complexity = self.assess_query_complexity(customer_input)
# Check if human handoff needed
if self.should_escalate(sentiment, complexity, conversation.history):
human_agent = await self.request_human_agent()
await self.smooth_handoff(conversation, human_agent)
return
# Generate response
response = self.generate_response(customer_input, conversation.context)
# Human approval for certain actions
if response.involves_refund or response.policy_exception:
approval = await self.get_human_approval(response)
if not approval.granted:
response = self.generate_alternative_response()
await conversation.send_response(response)
def smooth_handoff(self, conversation, human_agent):
"""Seamless transition to human agent"""
# Summarize conversation
summary = self.summarize_conversation(conversation)
# Transfer context
human_agent.receive_context(
summary=summary,
customer_sentiment=conversation.current_sentiment,
attempted_solutions=conversation.attempted_solutions,
customer_priority=self.assess_customer_priority(conversation)
)
# Notify customer
transition_message = self.craft_transition_message(human_agent)
conversation.send_response(transition_message)
Common Pitfalls
1. Anthropomorphism
Mistake: Designing agents that seem too human Problem: Users develop inappropriate trust or expectations Solution: Clear communication about agent capabilities and limitations
2. Automation Bias
Mistake: Users accepting all agent recommendations Problem: Critical thinking atrophies Solution: Built-in friction and alternative presentations
3. Poor Handoff Design
Mistake: Abrupt transitions between human and agent Problem: Context loss and user frustration Solution: Smooth handoff protocols with context preservation
4. Inflexible Interaction Patterns
Mistake: One-size-fits-all interaction design Problem: Poor user experience across diverse users Solution: Adaptive interfaces based on user needs
5. Hidden Agent Actions
Mistake: Agent operates without transparency Problem: Loss of trust and situational awareness Solution: Clear activity indicators and logs
Hands-on Exercise
Design a human-agent collaborative system for code review:
-
Build the interface layer:
- Multi-modal input (code, comments, voice)
- Adaptive explanation depth
- Visual diff presentations
- Confidence indicators
-
Implement collaboration features:
- Human sets review priorities
- Agent identifies potential issues
- Joint discussion of complex problems
- Human makes final decisions
-
Add safety mechanisms:
- No automatic code changes
- Escalation for security issues
- Human approval for suggested fixes
- Audit trail of all decisions
-
Create trust calibration:
- Track agent accuracy over time
- Adjust automation levels
- Provide performance feedback
- Handle disagreements constructively
Further Reading
- "Human-AI Collaboration" - Stanford HAI Report 2023
- "Designing Human-Agent Teams" - MIT Press
- "Trust in Human-Robot Interaction" - ACM Computing Surveys
- "Explainable AI for Human-Agent Systems" - Nature Machine Intelligence
- "Safety in Human-AI Collaboration" - AI Safety Handbook
Connections
- Prerequisites: Agent Architectures, Human-Computer Interaction
- Related Topics: Explainable AI, AI Control Problem, Trust and Safety
- Next Steps: Multi-Agent Coordination, AI Governance
}, personalContent:# Human-Agent Interaction - Where the Rubber Meets the Road
After years of watching humans and agents try to work together (emphasis on "try"), I've seen every possible way this can go wrong. And right. But mostly wrong. Here's what I've learned from the trenches.
The Great Illusion
Everyone thinks human-agent interaction is about making agents more human-like. WRONG. It's about making the interaction more honest about what agents really are - very sophisticated tools that will occasionally do something spectacularly stupid.
My Journey Through the Uncanny Valley
Phase 1: "Let's Make It Like a Human!"
Built a super-friendly agent. Users started thanking it, apologizing to it, and one person invited it to their wedding. Not kidding.
Phase 2: "Let's Make It Obviously Robotic!"
Went full robot. Users hated it. "Why is it so cold?" "Can't it be nicer?" You can't win.
Phase 3: "Let's Be Honest About What It Is"
Finally found the sweet spot: competent tool with personality guardrails.
Real Patterns from Real Disasters
The "Autopilot Problem"
Users trust agents like they trust autopilot - which is to say, too much until something goes wrong.
class AutopilotPrevention:
def __init__(self):
self.complacency_counter = 0
def present_recommendation(self, recommendation):
self.complacency_counter += 1
if self.complacency_counter > 10:
# Time to wake them up
return self.add_friction(recommendation)
def add_friction(self, recommendation):
# Make them think
return {
'recommendation': recommendation,
'alternatives': self.generate_alternatives(),
'question': "Why do you think this is the best option?",
'forced_wait': 5 # seconds
}
The "Uncanny Valley of Helpfulness"
There's a sweet spot between "useless" and "creepy helpful":
class HelpfulnessCalibrator:
def calibrate_response(self, user_need, agent_capability):
if user_need == "quick_answer" and agent_capability == "write_dissertation":
return "brief_helpful_response" # Don't overdo it
if user_need == "detailed_analysis" and agent_capability == "basic_summary":
return "honest_limitation_admission" # Don't pretend
if user_need == "emotional_support" and agent_capability == "text_generation":
return "redirect_to_human" # Don't even try
My Actual Interface Design
Here's what actually works in production:
class BattleTestedInterface:
def __init__(self):
self.trust_level = "skeptical" # Start here
self.user_patience = "limited" # Always assume this
def interact(self, user_input):
# Set expectations immediately
response = {
'confidence': self.honest_confidence_level(),
'limitations': self.current_limitations(),
'result': self.process_input(user_input),
'alternatives': self.always_provide_alternatives(),
'human_override': "Always available"
}
# Add safety nets
if self.detecting_frustration(user_input):
response['escalation_option'] = "Talk to human now?"
if self.detecting_over_trust():
response['warning'] = "Remember to verify important decisions"
return response
The Patterns Nobody Talks About
The "First Day Effect"
Users are either terrified or think it's magic. Both are wrong.
The "Week Two Cliff"
Initial excitement wears off. Reality sets in. Usage drops 80%.
The "Month One Plateau"
The users who remain find their rhythm. These are your real users.
The "Quarter Crisis"
"Is this actually helping or just making things complicated?"
Real Stories from the Front Lines
The Over-Helpful Assistant
Built an email assistant that was TOO good at predicting responses. User sent "Sounds good!" to a marriage proposal meant for someone else because they were on autopilot. Now I add friction to important emails:
def check_email_importance(email):
danger_words = ['marriage', 'divorce', 'fired', 'pregnant', 'died', 'love']
if any(word in email.lower() for word in danger_words):
return "STOP. Read this carefully. Are you sure?"
The Rebellion of the Middle Managers
Deployed an agent that made middle management look inefficient (because it was). Adoption rate: 0%. Lesson learned: Consider organizational dynamics.
The Trust Flip-Flop
User: "This AI is stupid, I don't trust it" One week later Same user: "The AI said to delete everything, so I did"
Trust is binary and flips without warning.
My Rules for Human-Agent Interaction
Rule 1: Humans Lie About What They Want
They say they want full control. They actually want magic that requires no effort.
Rule 2: Make the Agent Slightly Dumber Than It Is
If your agent is 95% accurate, present it as 85% accurate. Under-promise, over-deliver.
Rule 3: Always Have an Escape Hatch
class EscapeHatch:
def add_to_every_interaction(self, response):
response.add_button("Talk to Human", style="big_red")
response.add_button("Disable AI Assistant", style="always_visible")
response.add_text("Made a mistake? Let us know", style="humble")
Rule 4: Show Your Work (But Not Too Much)
Users want to know why, but not a PhD thesis on transformer architectures.
The Interface Elements That Actually Matter
The Confidence Meter
Not a percentage. Humans don't understand probabilities.
def show_confidence(score):
if score > 0.9:
return "π Pretty confident"
elif score > 0.7:
return "π‘ Somewhat sure"
else:
return "π΄ Just guessing"
The "Agent Is Thinking" Indicator
Fake latency is sometimes good:
async def thoughtful_response(input):
response = instant_ai_response(input) # Actually instant
# Make them value it
await show_thinking_animation(seconds=2)
return response
The Disagreement Handler
def handle_user_disagreement(user_assertion, agent_belief):
# NEVER argue
return f"""
I understand you believe {user_assertion}.
My data suggested {agent_belief}, but I could be wrong.
Would you like me to:
1. Proceed with your approach
2. Show you why I thought differently
3. Get a human expert's opinion
"""
Interaction Anti-Patterns I've Seen
The "HAL 9000"
"I'm afraid I can't do that, Dave" - Don't be creepy
The "Overeager Intern"
"I've already done 47 things you didn't ask for!" - Don't be annoying
The "Philosopher"
"To truly understand your request, we must first define 'understanding'..." - Don't be pretentious
The "Yes Man"
Agrees with everything, even contradictions - Don't be useless
My Current Setup
Every agent I deploy has these interaction modes:
- Copilot Mode: Suggestions only, human drives
- Autopilot Mode: Agent drives, human monitors
- Collaborative Mode: True partnership
- Observer Mode: Agent watches and learns
- Off Mode: Because sometimes you need it off
The Trust Calibration Dance
class TrustCalibration:
def __init__(self):
self.trust_score = 0.5 # Start neutral
def update_trust(self, interaction_result):
if interaction_result.success and interaction_result.user_satisfied:
self.trust_score += 0.01 # Slow build
elif interaction_result.failure:
self.trust_score -= 0.1 # Fast drop
# Trust boundaries
self.trust_score = max(0.1, min(0.9, self.trust_score))
# Never full auto
if self.trust_score > 0.9:
self.inject_random_verification_request()
The Reality of Human-Agent Teams
The best human-agent teams I've seen work like a good marriage:
- Clear division of labor
- Mutual respect (even if one party is artificial)
- Communication about problems
- Regular check-ins
- Ability to work independently
- Coming together for big decisions
Final Thoughts
Perfect human-agent interaction is like a perfect relationship - it doesn't exist. The goal is "good enough that both parties don't want to throw the other out the window."
Remember: Your users are humans (usually). They're tired, distracted, and just want to get their work done. Design for the human they are, not the rational actor you wish they were.
The best interface is the one that makes users feel competent, not replaced. The best agent is the one that makes work easier, not just different.
And always, ALWAYS, have a big red button that turns the whole thing off. You'll need it less than you think, but when you need it, you'll REALLY need it.
Keep it human (the interaction, not the agent). π€π€
AI Tutoring and Educational Interactions
A critical subset of human-agent interaction involves educational AI systems. These interactions present unique challenges due to the trust relationship, power dynamics, and formative influence on learners.
Key Considerations for Educational AI
- Trust asymmetry: Students inherently trust educational authorities
- Cognitive vulnerability: Learners are in a receptive state
- Long-term influence: Educational interactions shape thinking patterns
- Scale of impact: Educational AI can influence millions simultaneously
For deeper exploration of these challenges, see:
- [[ai-tutors-educational-safety|AI Tutors and Educational AI Safety]]
- [[ai-tutor-manipulation-vectors|AI Tutor Manipulation and Influence Vectors]]
- [[safe-educational-ai-design|Research Frontiers in Safe Educational AI Design]]