ProxiAgent/Blog/AI Support Metrics That Actually Matter: Resolution Rate, Escalation Quality, and MTTR
Business6 min read

AI Support Metrics That Actually Matter: Resolution Rate, Escalation Quality, and MTTR

You deployed AI Tier 1 support. Calls are being answered. Tickets are being created. The dashboard shows green numbers.

But is it actually working?

Most support teams measure the same things they've always measured: total calls handled, average handle time, customer satisfaction score. These metrics were designed for human call centers. They'll tell you the AI is answering calls. They won't tell you if it's answering them well.

Here are the metrics that actually matter for AI-powered support — and the benchmarks you should be targeting.

1. First Call Resolution Rate (FCR)

What it measures: The percentage of calls resolved during the initial conversation without requiring a callback, escalation, or follow-up.

Why it matters: This is the single most important metric. If the AI resolves the issue and the customer never has to call back, that's a win. If the customer calls back tomorrow with the same problem, the first call was a failure regardless of what the notes say.

  • Week 1: 40-50% (the AI is learning, you're tuning the decision trees)
  • Month 1: 55-65%
  • Month 3: 65-75%
  • Mature deployment: 70-80%

How to improve it: Analyze calls that required callbacks. Why wasn't it resolved the first time? Missing diagnostic step? Incomplete knowledge base? Fix the gap.

Red flag: FCR below 50% after 60 days. Your decision trees need significant revision.

2. Escalation Quality Score

What it measures: When the AI escalates to Tier 2, how useful is the handoff? Rate on a 1-5 scale based on Tier 2 engineer feedback.

Why it matters: A high escalation rate with great handoffs is better than a low escalation rate with terrible ones. If the AI resolves 60% of calls but gives Tier 2 perfect handoffs on the other 40%, your overall support quality is excellent. If it resolves 80% but the remaining 20% are garbage escalations, you have a problem.

  • Score 1-2 (useless/minimal): Tier 2 starts from scratch. Target: under 5% of escalations.
  • Score 3 (adequate): Some useful info but incomplete. Target: under 20%.
  • Score 4-5 (good/excellent): Tier 2 can act immediately. Target: 75%+ of escalations.

How to measure: Weekly survey to Tier 2 engineers. Simple 5-question form. Takes 2 minutes per ticket. The data is gold.

How to improve it: Review low-scored escalations. What was missing? Add those data points to the AI's diagnostic collection.

Ready to stop missing calls?

ProxiAgent answers your business calls 24/7. Setup in under a week.

Get Your AI Agent

3. Mean Time to Resolution (MTTR)

What it measures: Total elapsed time from when the customer first calls to when the issue is confirmed resolved.

Why it matters: MTTR is the customer's experience. They don't care if the AI answered in 2 seconds if the issue took 3 days to resolve because of a bad escalation.

  • AI-resolved calls: 4-8 minutes (most should be in this range)
  • Escalated calls (with good handoff): 2-6 hours (depending on Tier 2 availability)
  • Escalated calls (with bad handoff): 12-48 hours (re-diagnosis required)

The ratio that matters: Good-handoff escalations should outnumber bad-handoff escalations 9:1 by month 3.

How to improve it: Track MTTR separately for AI-resolved and escalated calls. If escalated MTTR is high, the problem is handoff quality, not Tier 2 speed.

4. Tier 2 Deflection Rate

What it measures: What percentage of calls that would have gone to Tier 2 (in the old system) are now resolved by AI at Tier 1?

Why it matters: This is the ROI metric. Every deflected Tier 2 ticket saves $25-50 in engineering time. Multiply by volume and you get your actual dollar savings.

  • Month 1: 30-40% deflection
  • Month 3: 50-60% deflection
  • Month 6: 60-70% deflection

How to measure: Compare current Tier 2 volume to your pre-AI baseline. Adjust for any changes in total call volume.

Red flag: Deflection rate plateaus below 40%. Your decision trees aren't covering the right issues.

5. Repeat Call Rate

What it measures: How often does the same customer call back about the same issue within 7 days?

Why it matters: If the AI "resolves" a call but the customer calls back, it wasn't actually resolved. A low FCR can hide behind a technically-closed ticket. Repeat call rate catches false resolutions.

  • Target: under 8% repeat rate
  • Warning: 8-15%
  • Critical: above 15%

How to improve it: Identify repeat call patterns. Is the AI suggesting fixes that don't stick? Is it missing the root cause and treating symptoms?

6. Customer Effort Score (CES)

What it measures: How easy was it for the customer to get their problem resolved? Survey after the call: "On a scale of 1-7, how easy was it to resolve your issue?"

Why it matters: CSAT tells you if the customer is happy. CES tells you if the process was frictionless. You can have a satisfied customer who still found the process annoying (they'll tolerate it once but won't be a promoter). Low effort = high retention.

  • AI-resolved calls: Target CES of 6+ (out of 7)
  • Escalated calls: Target CES of 5+

How to improve it: Listen to low-CES calls. Where did the friction happen? Was the AI asking too many questions? Was the escalation process clunky?

What NOT to Measure

Average Handle Time (AHT): This is a Tier 1 human metric. For AI, a 12-minute call that resolves the issue is better than a 3-minute call that escalates. Don't optimize for short calls — optimize for resolved calls.

Total calls answered: A vanity metric. The AI will answer 100% of calls. What matters is what happens after it answers.

AI "accuracy": This is unmeasurable in a meaningful way for conversational AI. Focus on outcomes (did the customer's problem get fixed?) not process (did the AI say the right thing?).

The Dashboard You Should Build

Track these weekly. Share with both the support team and engineering leadership:

MetricThis WeekLast WeekTarget
First Call Resolution%%70%
Escalation Quality (avg score)x/5x/54.0+
MTTR (AI-resolved)minmin<8 min
MTTR (escalated)hrshrs<6 hrs
Tier 2 Deflection Rate%%60%
Repeat Call Rate%%<8%
Customer Effort Scorex/7x/76+

FAQ

How soon should we start measuring? Day 1. But don't judge the AI for 30 days. The first month is calibration — you're tuning decision trees and the metrics will improve significantly.

Who should own these metrics? Ideally a Support Operations lead or someone who spans Tier 1 and Tier 2. They need visibility into both sides of the escalation boundary.

What if our metrics look bad after 90 days? It's almost always a knowledge base problem, not an AI problem. The decision trees need refinement, or there's a gap in the troubleshooting logic. Review the lowest-performing call categories and rebuild those trees.

How do we benchmark against our previous human-only setup? Run AI and human Tier 1 in parallel for 30 days. Route similar call types to both. Compare FCR, MTTR, and escalation quality side by side.

Measure What Matters, Fix What's Broken

Metrics don't improve support by themselves. They show you where to focus. Track these six metrics weekly, act on the signals, and your AI support will outperform most human Tier 1 teams within 90 days.

ProxiAgent includes built-in analytics for all of these metrics. See the dashboard at proxicall.ai/agent.