Design an AI customer support agent
Walk through designing an AI customer support system that resolves 70% of tickets autonomously, escalates complex issues to humans, and learns from agent resolutions to improve over time.
TL;DR
- A two-stage intent classifier (fast embedding model for 80% of traffic, LLM fallback for ambiguous cases) routes customer messages to resolution workflows in under 50ms average, compared to 200ms+ for a single LLM call on every message.
- Multi-signal escalation scoring (intent confidence, retrieval quality, customer sentiment, policy rules) replaces naive "I don't know" detection and reduces unnecessary escalations from 45% to 15-20%.
- RAG over help articles, past ticket resolutions, and live account data gives the response generator grounded context, keeping hallucination rates below 2% on factual queries.
- At scale (100K tickets/day), AI auto-resolution at $0.05-0.15 per ticket versus $15-25 per human-handled ticket saves over $1M per week at 70% deflection.
- The production lesson: the AI agent that "resolves" a ticket is meaningless unless it can actually execute actions (process refunds, reset passwords, update orders). Response generation without tool calling is a fancy FAQ page.
Requirements
Functional requirements
- Customers can submit support messages via chat (web, mobile) and receive an AI-generated response within 5 seconds.
- The system classifies each customer message into an intent category (order status, refund request, password reset, billing question, general FAQ, complaint) and routes to the appropriate resolution workflow.
- The AI agent retrieves relevant context (help articles, past ticket resolutions, customer account data) to generate grounded, accurate responses.
- The system executes actions on behalf of the customer (process refunds, reset passwords, check order status) when the resolved intent maps to an executable workflow.
- When the AI agent's confidence is below threshold or the issue type requires human judgment (billing disputes, account recovery, emotionally distressed customers), the system escalates to a human agent with full conversation context.
- Human agent resolutions are logged and fed back into the system to improve classifier accuracy and response quality over time.
Non-functional requirements
- First response latency: P95 under 5 seconds for AI-resolved tickets (versus 4-8 hour average for human queue).
- Throughput: 100,000 tickets per day (approximately 1,200 messages per minute at peak).
- Auto-resolution rate (deflection): 70%+ of tickets resolved without human involvement.
- Response accuracy: less than 2% hallucination rate on factual queries (order status, pricing, policy information).
- Cost per AI-resolved ticket: $0.05-0.15, compared to $15-25 per human-handled ticket.
- Customer satisfaction (CSAT): AI-resolved tickets score 4.0+ out of 5.0 (human baseline is 4.5/5).
The hardest engineering problem here: knowing when the AI should stop trying. An overconfident agent that gives wrong answers destroys customer trust faster than a slow human queue. An underconfident agent that escalates everything defeats the purpose of automation. The escalation threshold is the make-or-break calibration point, and it depends on signals from multiple systems (classifier, retriever, sentiment analyzer) that each have their own failure modes.
The core entities
SupportTicket
ticket_id,customer_id,channel(web_chat, mobile, email),status(open, ai_resolved, escalated, human_resolved),created_at,resolved_at,resolution_type,csat_score
CustomerMessage
message_id,ticket_id,content,sender(customer, ai_agent, human_agent),timestamp,intent_classification,confidence_score
IntentClassification
classification_id,message_id,intent(order_status, refund, password_reset, billing, faq, complaint, unknown),confidence,classifier_used(embedding, llm),sub_intent,requires_action(boolean)
RetrievalContext
retrieval_id,message_id,sources(array of doc IDs),relevance_scores,retrieval_latency_ms,context_token_count
EscalationDecision
decision_id,ticket_id,escalated(boolean),score,signals(intent_confidence, retrieval_quality, sentiment, policy_override),reason,decided_at
ActionExecution
action_id,ticket_id,action_type(refund, password_reset, order_lookup, account_update),parameters,status(pending, executed, failed),executed_at
ResolutionFeedback
feedback_id,ticket_id,original_ai_response,human_correction,correction_type(wrong_intent, wrong_answer, missing_context, tone_issue),used_for_training(boolean)
API design
POST /v1/support/message - submit a customer message and receive an AI response
Request: {
"ticket_id": "tkt_abc123",
"customer_id": "cust_456",
"message": "I ordered a laptop 3 days ago and it still says processing. Can I get a refund?",
"channel": "web_chat",
"metadata": {
"page_url": "/orders/ORD-789",
"previous_messages": 2
}
}
Response: {
"message_id": "msg_def789",
"response": "I can see your order ORD-789 is currently in processing. It typically ships within 3-5 business days. Would you like me to process a refund, or would you prefer to wait for shipping?",
"intent": "multi_intent:order_status+refund",
"confidence": 0.87,
"actions_available": ["process_refund", "check_shipping_eta"],
"escalated": false,
"sources": ["help-article-102", "order-ORD-789"],
"response_latency_ms": 1840
}
The response includes available actions so the frontend can render action buttons. Multi-intent detection (order status + refund) avoids forcing customers to ask one thing at a time.
POST /v1/support/action - execute a customer action (refund, password reset, etc.)
Request: {
"ticket_id": "tkt_abc123",
"action_type": "process_refund",
"parameters": {
"order_id": "ORD-789",
"reason": "customer_request",
"amount": "full"
},
"confirmed_by_customer": true
}
Response: {
"action_id": "act_ghi012",
"status": "executed",
"result": "Refund of $1,299.00 initiated for order ORD-789. Expect 5-7 business days for processing.",
"executed_at": "2026-04-11T14:32:00Z"
}
POST /v1/support/escalate - manually escalate or system-triggered escalation to human agent
Request: {
"ticket_id": "tkt_abc123",
"reason": "customer_requested_human",
"conversation_summary": "Customer asked about order ORD-789 refund. AI offered refund but customer has additional billing concerns.",
"priority": "medium",
"escalation_signals": {
"intent_confidence": 0.62,
"sentiment_score": -0.4,
"retrieval_quality": 0.55,
"policy_override": false
}
}
Response: {
"escalation_id": "esc_jkl345",
"assigned_agent": "agent_sarah",
"estimated_wait_minutes": 12,
"queue_position": 4,
"handoff_context_included": true
}
GET /v1/support/analytics - deflection metrics and system performance
Response: {
"period": "2026-04-11",
"total_tickets": 98420,
"ai_resolved": 69094,
"escalated": 19684,
"pending": 9642,
"deflection_rate": 0.702,
"avg_ai_response_ms": 1840,
"avg_csat_ai": 4.21,
"avg_csat_human": 4.48,
"cost_savings_usd": 1036410,
"top_intents": [
{ "intent": "order_status", "count": 31200, "auto_resolved_pct": 0.92 },
{ "intent": "refund", "count": 18600, "auto_resolved_pct": 0.71 },
{ "intent": "password_reset", "count": 14100, "auto_resolved_pct": 0.97 }
]
}
High-level design
The system splits into two pipelines: the real-time resolution pipeline (handles incoming messages) and the offline learning pipeline (improves the system from human corrections).
The real-time pipeline receives a customer message, classifies intent, retrieves context from multiple sources, generates a response, scores confidence, and either sends the response or escalates. The entire flow completes in under 5 seconds end-to-end. I have seen teams try to build this as a single monolithic LLM call ("just send everything to GPT-4o"), and it works for demos but breaks at 10K tickets/day due to cost and latency.
The offline pipeline collects human agent corrections, retrains the intent classifier weekly, updates the RAG knowledge base with new resolutions, and A/B tests updated models before full rollout. This feedback flywheel is what separates a static FAQ bot from a system that actually improves.
The architecture separates read-heavy operations (intent classification, retrieval) from write-heavy operations (action execution, ticket logging) so they scale independently. The vector database handles semantic search over help articles and past resolutions, while PostgreSQL stores structured ticket data and customer records.
For your interview: draw this diagram in two passes. First, the happy path (message to AI response). Second, add the escalation branch and the learning loop. Interviewers love seeing you build incrementally.
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.