Design an AI customer support agent

TL;DR

A two-stage intent classifier (fast embedding model for 80% of traffic, LLM fallback for ambiguous cases) routes customer messages to resolution workflows in under 50ms average, compared to 200ms+ for a single LLM call on every message.
Multi-signal escalation scoring (intent confidence, retrieval quality, customer sentiment, policy rules) replaces naive "I don't know" detection and reduces unnecessary escalations from 45% to 15-20%.
RAG over help articles, past ticket resolutions, and live account data gives the response generator grounded context, keeping hallucination rates below 2% on factual queries.
At scale (100K tickets/day), AI auto-resolution at $0.05-0.15 per ticket versus $15-25 per human-handled ticket saves over $1M per week at 70% deflection.
The production lesson: the AI agent that "resolves" a ticket is meaningless unless it can actually execute actions (process refunds, reset passwords, update orders). Response generation without tool calling is a fancy FAQ page.

Requirements

Functional requirements

Customers can submit support messages via chat (web, mobile) and receive an AI-generated response within 5 seconds.
The system classifies each customer message into an intent category (order status, refund request, password reset, billing question, general FAQ, complaint) and routes to the appropriate resolution workflow.
The AI agent retrieves relevant context (help articles, past ticket resolutions, customer account data) to generate grounded, accurate responses.
The system executes actions on behalf of the customer (process refunds, reset passwords, check order status) when the resolved intent maps to an executable workflow.
When the AI agent's confidence is below threshold or the issue type requires human judgment (billing disputes, account recovery, emotionally distressed customers), the system escalates to a human agent with full conversation context.
Human agent resolutions are logged and fed back into the system to improve classifier accuracy and response quality over time.

Non-functional requirements

First response latency: P95 under 5 seconds for AI-resolved tickets (versus 4-8 hour average for human queue).
Throughput: 100,000 tickets per day (approximately 1,200 messages per minute at peak).
Auto-resolution rate (deflection): 70%+ of tickets resolved without human involvement.
Response accuracy: less than 2% hallucination rate on factual queries (order status, pricing, policy information).
Cost per AI-resolved ticket: $0.05-0.15, compared to $15-25 per human-handled ticket.
Customer satisfaction (CSAT): AI-resolved tickets score 4.0+ out of 5.0 (human baseline is 4.5/5).

The hardest engineering problem here: knowing when the AI should stop trying. An overconfident agent that gives wrong answers destroys customer trust faster than a slow human queue. An underconfident agent that escalates everything defeats the purpose of automation. The escalation threshold is the make-or-break calibration point, and it depends on signals from multiple systems (classifier, retriever, sentiment analyzer) that each have their own failure modes.

The core entities

SupportTicket

ticket_id, customer_id, channel (web_chat, mobile, email), status (open, ai_resolved, escalated, human_resolved), created_at, resolved_at, resolution_type, csat_score

CustomerMessage

message_id, ticket_id, content, sender (customer, ai_agent, human_agent), timestamp, intent_classification, confidence_score

IntentClassification

classification_id, message_id, intent (order_status, refund, password_reset, billing, faq, complaint, unknown), confidence, classifier_used (embedding, llm), sub_intent, requires_action (boolean)

RetrievalContext

retrieval_id, message_id, sources (array of doc IDs), relevance_scores, retrieval_latency_ms, context_token_count

EscalationDecision

decision_id, ticket_id, escalated (boolean), score, signals (intent_confidence, retrieval_quality, sentiment, policy_override), reason, decided_at

ActionExecution

action_id, ticket_id, action_type (refund, password_reset, order_lookup, account_update), parameters, status (pending, executed, failed), executed_at

ResolutionFeedback

feedback_id, ticket_id, original_ai_response, human_correction, correction_type (wrong_intent, wrong_answer, missing_context, tone_issue), used_for_training (boolean)

API design

POST /v1/support/message - submit a customer message and receive an AI response

Request: {
  "ticket_id": "tkt_abc123",
  "customer_id": "cust_456",
  "message": "I ordered a laptop 3 days ago and it still says processing. Can I get a refund?",
  "channel": "web_chat",
  "metadata": {
    "page_url": "/orders/ORD-789",
    "previous_messages": 2
  }
}
Response: {
  "message_id": "msg_def789",
  "response": "I can see your order ORD-789 is currently in processing. It typically ships within 3-5 business days. Would you like me to process a refund, or would you prefer to wait for shipping?",
  "intent": "multi_intent:order_status+refund",
  "confidence": 0.87,
  "actions_available": ["process_refund", "check_shipping_eta"],
  "escalated": false,
  "sources": ["help-article-102", "order-ORD-789"],
  "response_latency_ms": 1840
}

The response includes available actions so the frontend can render action buttons. Multi-intent detection (order status + refund) avoids forcing customers to ask one thing at a time.

POST /v1/support/action - execute a customer action (refund, password reset, etc.)

Request: {
  "ticket_id": "tkt_abc123",
  "action_type": "process_refund",
  "parameters": {
    "order_id": "ORD-789",
    "reason": "customer_request",
    "amount": "full"
  },
  "confirmed_by_customer": true
}
Response: {
  "action_id": "act_ghi012",
  "status": "executed",
  "result": "Refund of $1,299.00 initiated for order ORD-789. Expect 5-7 business days for processing.",
  "executed_at": "2026-04-11T14:32:00Z"
}

POST /v1/support/escalate - manually escalate or system-triggered escalation to human agent

Request: {
  "ticket_id": "tkt_abc123",
  "reason": "customer_requested_human",
  "conversation_summary": "Customer asked about order ORD-789 refund. AI offered refund but customer has additional billing concerns.",
  "priority": "medium",
  "escalation_signals": {
    "intent_confidence": 0.62,
    "sentiment_score": -0.4,
    "retrieval_quality": 0.55,
    "policy_override": false
  }
}
Response: {
  "escalation_id": "esc_jkl345",
  "assigned_agent": "agent_sarah",
  "estimated_wait_minutes": 12,
  "queue_position": 4,
  "handoff_context_included": true
}

GET /v1/support/analytics - deflection metrics and system performance

Response: {
  "period": "2026-04-11",
  "total_tickets": 98420,
  "ai_resolved": 69094,
  "escalated": 19684,
  "pending": 9642,
  "deflection_rate": 0.702,
  "avg_ai_response_ms": 1840,
  "avg_csat_ai": 4.21,
  "avg_csat_human": 4.48,
  "cost_savings_usd": 1036410,
  "top_intents": [
    { "intent": "order_status", "count": 31200, "auto_resolved_pct": 0.92 },
    { "intent": "refund", "count": 18600, "auto_resolved_pct": 0.71 },
    { "intent": "password_reset", "count": 14100, "auto_resolved_pct": 0.97 }
  ]
}

High-level design

The system splits into two pipelines: the real-time resolution pipeline (handles incoming messages) and the offline learning pipeline (improves the system from human corrections).

The real-time pipeline receives a customer message, classifies intent, retrieves context from multiple sources, generates a response, scores confidence, and either sends the response or escalates. The entire flow completes in under 5 seconds end-to-end. I have seen teams try to build this as a single monolithic LLM call ("just send everything to GPT-4o"), and it works for demos but breaks at 10K tickets/day due to cost and latency.

The offline pipeline collects human agent corrections, retrains the intent classifier weekly, updates the RAG knowledge base with new resolutions, and A/B tests updated models before full rollout. This feedback flywheel is what separates a static FAQ bot from a system that actually improves.

The architecture separates read-heavy operations (intent classification, retrieval) from write-heavy operations (action execution, ticket logging) so they scale independently. The vector database handles semantic search over help articles and past resolutions, while PostgreSQL stores structured ticket data and customer records.

For your interview: draw this diagram in two passes. First, the happy path (message to AI response). Second, add the escalation branch and the learning loop. Interviewers love seeing you build incrementally.

TL;DR

A two-stage intent classifier (fast embedding model for 80% of traffic, LLM fallback for ambiguous cases) routes customer messages to resolution workflows in under 50ms average, compared to 200ms+ for a single LLM call on every message.
Multi-signal escalation scoring (intent confidence, retrieval quality, customer sentiment, policy rules) replaces naive "I don't know" detection and reduces unnecessary escalations from 45% to 15-20%.
RAG over help articles, past ticket resolutions, and live account data gives the response generator grounded context, keeping hallucination rates below 2% on factual queries.
At scale (100K tickets/day), AI auto-resolution at $0.05-0.15 per ticket versus $15-25 per human-handled ticket saves over $1M per week at 70% deflection.
The production lesson: the AI agent that "resolves" a ticket is meaningless unless it can actually execute actions (process refunds, reset passwords, update orders). Response generation without tool calling is a fancy FAQ page.

Requirements

Functional requirements

Customers can submit support messages via chat (web, mobile) and receive an AI-generated response within 5 seconds.
The system classifies each customer message into an intent category (order status, refund request, password reset, billing question, general FAQ, complaint) and routes to the appropriate resolution workflow.
The AI agent retrieves relevant context (help articles, past ticket resolutions, customer account data) to generate grounded, accurate responses.
The system executes actions on behalf of the customer (process refunds, reset passwords, check order status) when the resolved intent maps to an executable workflow.
When the AI agent's confidence is below threshold or the issue type requires human judgment (billing disputes, account recovery, emotionally distressed customers), the system escalates to a human agent with full conversation context.
Human agent resolutions are logged and fed back into the system to improve classifier accuracy and response quality over time.

Non-functional requirements

First response latency: P95 under 5 seconds for AI-resolved tickets (versus 4-8 hour average for human queue).
Throughput: 100,000 tickets per day (approximately 1,200 messages per minute at peak).
Auto-resolution rate (deflection): 70%+ of tickets resolved without human involvement.
Response accuracy: less than 2% hallucination rate on factual queries (order status, pricing, policy information).
Cost per AI-resolved ticket: $0.05-0.15, compared to $15-25 per human-handled ticket.
Customer satisfaction (CSAT): AI-resolved tickets score 4.0+ out of 5.0 (human baseline is 4.5/5).

The hardest engineering problem here: knowing when the AI should stop trying. An overconfident agent that gives wrong answers destroys customer trust faster than a slow human queue. An underconfident agent that escalates everything defeats the purpose of automation. The escalation threshold is the make-or-break calibration point, and it depends on signals from multiple systems (classifier, retriever, sentiment analyzer) that each have their own failure modes.

The core entities

SupportTicket

ticket_id, customer_id, channel (web_chat, mobile, email), status (open, ai_resolved, escalated, human_resolved), created_at, resolved_at, resolution_type, csat_score

CustomerMessage

message_id, ticket_id, content, sender (customer, ai_agent, human_agent), timestamp, intent_classification, confidence_score

IntentClassification

classification_id, message_id, intent (order_status, refund, password_reset, billing, faq, complaint, unknown), confidence, classifier_used (embedding, llm), sub_intent, requires_action (boolean)

RetrievalContext

retrieval_id, message_id, sources (array of doc IDs), relevance_scores, retrieval_latency_ms, context_token_count

EscalationDecision

decision_id, ticket_id, escalated (boolean), score, signals (intent_confidence, retrieval_quality, sentiment, policy_override), reason, decided_at

ActionExecution

action_id, ticket_id, action_type (refund, password_reset, order_lookup, account_update), parameters, status (pending, executed, failed), executed_at

ResolutionFeedback

feedback_id, ticket_id, original_ai_response, human_correction, correction_type (wrong_intent, wrong_answer, missing_context, tone_issue), used_for_training (boolean)

API design

POST /v1/support/message - submit a customer message and receive an AI response

Request: {
  "ticket_id": "tkt_abc123",
  "customer_id": "cust_456",
  "message": "I ordered a laptop 3 days ago and it still says processing. Can I get a refund?",
  "channel": "web_chat",
  "metadata": {
    "page_url": "/orders/ORD-789",
    "previous_messages": 2
  }
}
Response: {
  "message_id": "msg_def789",
  "response": "I can see your order ORD-789 is currently in processing. It typically ships within 3-5 business days. Would you like me to process a refund, or would you prefer to wait for shipping?",
  "intent": "multi_intent:order_status+refund",
  "confidence": 0.87,
  "actions_available": ["process_refund", "check_shipping_eta"],
  "escalated": false,
  "sources": ["help-article-102", "order-ORD-789"],
  "response_latency_ms": 1840
}

The response includes available actions so the frontend can render action buttons. Multi-intent detection (order status + refund) avoids forcing customers to ask one thing at a time.

POST /v1/support/action - execute a customer action (refund, password reset, etc.)

Request: {
  "ticket_id": "tkt_abc123",
  "action_type": "process_refund",
  "parameters": {
    "order_id": "ORD-789",
    "reason": "customer_request",
    "amount": "full"
  },
  "confirmed_by_customer": true
}
Response: {
  "action_id": "act_ghi012",
  "status": "executed",
  "result": "Refund of $1,299.00 initiated for order ORD-789. Expect 5-7 business days for processing.",
  "executed_at": "2026-04-11T14:32:00Z"
}

POST /v1/support/escalate - manually escalate or system-triggered escalation to human agent

Request: {
  "ticket_id": "tkt_abc123",
  "reason": "customer_requested_human",
  "conversation_summary": "Customer asked about order ORD-789 refund. AI offered refund but customer has additional billing concerns.",
  "priority": "medium",
  "escalation_signals": {
    "intent_confidence": 0.62,
    "sentiment_score": -0.4,
    "retrieval_quality": 0.55,
    "policy_override": false
  }
}
Response: {
  "escalation_id": "esc_jkl345",
  "assigned_agent": "agent_sarah",
  "estimated_wait_minutes": 12,
  "queue_position": 4,
  "handoff_context_included": true
}

GET /v1/support/analytics - deflection metrics and system performance

Response: {
  "period": "2026-04-11",
  "total_tickets": 98420,
  "ai_resolved": 69094,
  "escalated": 19684,
  "pending": 9642,
  "deflection_rate": 0.702,
  "avg_ai_response_ms": 1840,
  "avg_csat_ai": 4.21,
  "avg_csat_human": 4.48,
  "cost_savings_usd": 1036410,
  "top_intents": [
    { "intent": "order_status", "count": 31200, "auto_resolved_pct": 0.92 },
    { "intent": "refund", "count": 18600, "auto_resolved_pct": 0.71 },
    { "intent": "password_reset", "count": 14100, "auto_resolved_pct": 0.97 }
  ]
}

High-level design

The system splits into two pipelines: the real-time resolution pipeline (handles incoming messages) and the offline learning pipeline (improves the system from human corrections).

Design an AI customer support agent

TL;DR

Requirements

Functional requirements

Non-functional requirements

The core entities

API design

High-level design

Continue Reading with Premium

Comments

Design an AI customer support agent

TL;DR

Requirements

Functional requirements

Non-functional requirements

The core entities

API design

High-level design

Continue Reading with Premium

Comments