The metric most AI customer support vendors optimise for is automation rate — the percentage of queries resolved without human involvement. The higher the number, the better the product looks on a slide. The problem with this framing is that it treats human review as a failure state: something to be minimised, worked around, and eventually eliminated. That mischaracterises what human review is actually for in a governed AI support system.

The false choice between automation and human review

The standard AI customer support vendor pitch positions human review as a stepping stone — something you do while the AI is learning, before it is ready to automate fully. The end state, in this framing, is full automation. Human review is a temporary inconvenience on the way there.

This framing is wrong in two ways. First, full automation in high-stakes support categories is not a realistic or desirable goal for most enterprise teams. Billing disputes, account changes, and complex product queries will always benefit from human judgement in edge cases. The goal is not to eliminate human review — it is to reserve human review for the queries where it adds the most value.

Second, it misses the structural function of human review in a governed system. Human review is not a fallback — it is the primary mechanism through which the AI's accuracy improves. Corrections from human reviewers are the training signal that moves FortiAgent's Trust Score from below the automation threshold to above it. Remove human review and you remove the improvement loop.

How accuracy-threshold-based human review works

In a governed AI customer support system, human review is not triggered by the AI's uncertainty about a specific response. It is triggered by the governance layer's assessment of the AI's accuracy in the relevant category.

FortiVault's automation gating works as follows: FortiAgent generates a response. FortiVault checks the current AI Trust Score for the query category — billing, returns, login, technical support — against the threshold configured for that category. If the Trust Score is at or above the threshold, the response is sent automatically. If it is below, the response enters the human review queue before the customer sees it.

This is a materially different mechanism from AI confidence scoring. Confidence scoring asks: how sure is the AI about this specific response? Trust Score-based gating asks: how accurate has the AI been in this category recently, and does that accuracy meet the standard we have set? The second question is more useful because it is based on observed outcomes, not the AI's self-assessment.

An AI that is confidently wrong is more dangerous than an AI that knows it is uncertain. Trust Score-based gating catches confidently wrong responses because it is based on what actually happened — not what the AI thinks about its own output.

Which support categories benefit most from structured human review

Billing and refunds

The highest-stakes category in most B2C and SaaS support operations. AI errors in billing responses have direct financial consequences — incorrect refund amounts, wrong payment terms, disputed transaction outcomes. Human review in this category is not a sign of weak AI — it is the appropriate policy for a category where the cost of an error outweighs the cost of the review.

The automation gate for billing categories should be set high — typically 90% or above. That means that until FortiAgent has demonstrated 90%+ accuracy specifically on billing queries in your environment, every billing response gets reviewed. The time spent in review is not wasted — it is generating the correction data that moves the Trust Score toward the threshold.

Account changes

Subscription cancellations, email changes, account deletions, permission updates. These are write operations — and write operations that go wrong may not be easily reversible. A human reviewer catching an incorrect account cancellation before it executes is worth a significant amount of customer service effort after the fact.

Technically complex queries

Categories where the query surface area is large, the correct answer is not always in the knowledge base, and the cost of an incorrect technical recommendation is meaningful. Human review here is not about AI inaccuracy — it is about the inherent complexity of the category and the value a human brings to edge cases.

Human review as a training signal, not just a safety net

The most important function of human review in a governed AI system is often under-discussed: it is the mechanism that makes AI accuracy improve over time in a structured, measurable way.

In FortiVault, when a support agent reviews a FortiAgent response and makes a correction, that correction is logged with full context: what the AI said, what was changed, and the category. This data feeds the Trust Score calculation for that category. A high correction rate signals declining accuracy and may drop the Trust Score below the automation threshold. A low correction rate — fewer corrections, or corrections of lower significance — signals improving accuracy and may push the Trust Score above the threshold.

This creates a self-reinforcing loop: human review generates the signal that improves accuracy, improving accuracy reduces the volume of responses that require review, freeing reviewer capacity for the categories that still need it. The human review queue is not static — it shrinks as the AI earns its way toward automation category by category.

High correction rate on billing queries → Trust Score drops → more responses enter review → more correction data generated
Consistent approval on FAQ queries → Trust Score rises → fewer responses enter review → reviewer time freed
New query pattern emerges → Trust Score dips → gating re-triggers → reviewed until accuracy re-established

Building toward full automation without skipping oversight

The practical path to high AI automation rates in customer support is through human review, not around it. Teams that try to bypass the review phase — by deploying into automation mode prematurely, by setting very low accuracy thresholds, or by using tools that do not surface accuracy data by category — typically discover the problem in customer complaints rather than in their own dashboards.

The governed path looks like this: deploy FortiAgent into full human review mode initially, collecting accuracy data across all categories. Identify the categories where accuracy reaches the threshold first — typically low-stakes informational categories like shipping status and FAQ answers. Enable automation in those categories. Continue collecting data in review-mode categories. Enable automation category by category as accuracy is proven. The automation rate climbs, but it does so on the back of demonstrated accuracy — not configuration optimism.

The end state is not zero human review. It is human review reserved for the categories where it adds genuine value — the high-stakes, complex, and edge-case queries where a human reviewer is worth the operational cost. That is not a compromise. It is the right design.

Try FortiVault

See the governance layer in action

FortiVault's AI Trust Score, automation gating, and full audit trail — applied to your support categories.

Start free trial Request a demo

Human-in-the-Loop AI Customer Support: Why It Is Not a Compromise