Voice AI Analytics Dashboards: What an Indian VP of Ops Should Demand from a Vendor in 2026

It is a Thursday steering committee at a 900-seat contact center in Gurgaon. Sneha Bhasin, VP of Operations, is on the third quarter of being told the same thing by the same vendor: "the dashboard is coming next sprint." On the screen is a slide with four tiles — total calls, connected calls, average duration, "AI handled." Her MIS lead, sitting two chairs down, has already pulled the raw CDR from the telephony partner into Excel and is silently doing the reconciliation that the dashboard was supposed to do. The "AI handled" number on the slide is 38% higher than what the CDR says was actually completed without a human transfer.
She does not say anything in the meeting. She schedules a one-on-one with the vendor's CSM for Monday and asks her team to pull together a list of every metric the vendor's dashboard cannot answer. The list is two pages long by Friday. None of the questions are exotic. They are the ones any operator running 80,000 outbound calls a day needs to answer before lunch — and the dashboard, as shipped, answers approximately none of them.
This is the most common failure mode of voice AI buying in India in 2026. The pilot looked clean. The production dashboard is a slide.
The thesis
A voice AI analytics dashboard is not a marketing artifact. It is the operating console an Indian VP of Ops uses to run a campaign, reconcile cost with the telephony partner, defend numbers to finance, and prove compliance to the regulator. Most vendor dashboards in 2026 are still demo-grade — pretty tiles, no joins to CDR, no DLT log surface, no per-language WER, no cost-per-recovered-call. This post is what to actually demand in an RFP, the specific metrics that matter for an Indian deployment, the comparison between what vendors show and what you should make non-negotiable, and a phased rollout plan that gets you from "the dashboard is coming next sprint" to a console your MIS team trusts.
Why this matters now
Three things changed in the last twelve months that make dashboard quality a procurement decision, not a UX preference.
First, voice AI volumes crossed the threshold where finance started asking questions an operator cannot answer from a vendor's default view. When you were running 5,000 calls a week as a pilot, you could explain variance with anecdotes. At 80,000 a day across an NBFC collections book or a D2C cart-recovery program, you cannot. The CFO wants to know unit economics per BIN bucket, per circle, per attempt number — and "AI handled 67%" does not survive the question "what does that mean in rupees recovered per dialed minute?"
Second, TRAI DLT enforcement and the DPDP Act 2023 consent regime have made the audit trail itself a deliverable. The regulator does not care that your voice AI is clever. They care that you can produce, on demand, the consent timestamp, the DLT header that played, the scrubbing decision at dial time, and the recording for every call where a financial commitment was made. If your dashboard cannot export that as a CSV per call, you are one notice away from a problem.
Third, RBI Fair Practices updates and the IRDAI recording mandate for sales calls have pushed compliance from "annual audit" to "weekly evidence." Banks and insurers running outbound voice AI now ask vendors for monthly compliance reports that include disclosure rate, consent-confirmation rate, and recording-retention proof. A dashboard that cannot generate these without a Python script and three calendar days of work is not enterprise-ready, regardless of what the sales deck says.
The vendor incentive is to ship a dashboard that demos well. The operator incentive is a dashboard that reconciles. Those are different products. Knowing the difference, and putting it in the RFP, is the buyer's job in 2026.
The mechanism: what an honest voice AI dashboard actually contains
A working voice AI analytics dashboard has four layers, and most vendor demos stop at layer one.
Layer 1: Volume and outcome counters
These are the tiles that show up in every demo. They are necessary and insufficient. The honest version names exactly what each number is measured against.
- Total dialed — distinct attempts, not distinct numbers.
- Connect rate — calls answered by a human within N rings, as a percentage of dialed. Specify what "answered" means at the SIP layer; "answered" by some telephony partners includes voicemail pickup. That distinction is worth two to four percentage points of inflated connect rate.
- Average Handle Time (AHT) — measured from first ring to disposition write. Vendors who only count "talk time" are hiding the silence and the post-call processing.
- Completion rate — calls that reached the designed terminal state (e.g. promise-to-pay captured, address confirmed, OTP read back), not calls where the audio finished playing.
- Right-Party-Contact (RPC) rate — the bot correctly identified that it was speaking to the intended person. For collections and KYC, this is the only outcome counter that matters before the intent metrics.
If a dashboard cannot break each of these by attempt number (first, second, third), it is not useful for outbound. Cumulative RPC across attempts is the actual operator number.
Layer 2: Conversation quality and intent
This is the layer that separates a CRM screen from an analytics console.
- Intent capture % — of completed calls, the fraction where the bot logged a structured intent (PTP date, address, callback request, escalation reason). The denominator matters: if it's calls reached vs calls completed, you can get very different headlines.
- Sentiment distribution — positive / neutral / negative / agitated, per call, with timestamps. The aggregate number is the least useful version. The useful version is sentiment by utterance, so you can find the line in the script that is irritating people.
- Drop-off by utterance — at which line of the script the human hung up or went silent. This is the single most valuable view for script optimization, and the one almost no demo dashboard exposes.
- Hot transfer SLA — for calls escalated to a human, the median and P95 wait time before the agent picked up. If your hot transfer SLA is 22 seconds at P95, you are losing the call.
- Callback queue depth — how many promised callbacks are outstanding, by SLA bucket. A bot that promised "we will call you back in 30 minutes" has a clock running. Most dashboards do not surface that clock until it is breached.
Layer 3: Engineering and infra honesty
This is the layer ops and engineering jointly read. Most vendors hide it.
- WER (Word Error Rate) by language and dialect cohort — Delhi Hindi, Patna Hindi, Tamil, Telugu, Marathi, Bengali — each as its own line. A blended WER is worse than useless; it hides the cohort where your bot is failing. We have seen production WER on Bhojpuri-influenced Hindi sit at 22–28% while the demo WER on Delhi Hindi was 7%. If your dashboard reports one number, you cannot make a fix.
- P50 and P95 latency — bot response time from end-of-user-utterance to start-of-bot-audio. Above 800ms at P95, the conversation starts to feel broken; above 1.5s, people hang up. The metric matters more than ASR accuracy on hold-time calls.
- DLT header pass rate — of dialed templates, the fraction that played the registered header successfully. A miss here is a TRAI exposure, not a UX issue.
- Consent confirmation rate — for flows that require explicit consent (recording disclosure, financial commitment, eKYC start), the rate at which the bot got an affirmative response on the first ask. Below 80%, your script is wrong; below 60%, you have a compliance hole.
- Cost per recovered call — telephony cost plus voice AI cost divided by completed-and-converted calls. For collections, "cost per rupee recovered." For lead-qual, "cost per qualified handoff." Without this number, every other metric is theater.
Layer 4: Reconciliation with telephony and CRM
The fourth layer is the one MIS teams build themselves when the vendor refuses to. It should not be their job.
- CDR reconciliation — daily join between the voice AI platform's call log and the telephony partner's (Exotel, Knowlarity, Plivo, Servetel, Ozonetel) CDR, with variance flags for calls present in one and not the other. Variance above 2% means somebody is miscounting and someone is overcharging.
- DLT scrubbing log — every number screened against the DND/preference database at dial time, with timestamp, decision, and template ID. Required for audit.
- CRM write-back integrity — fraction of completed calls whose disposition, recording URL, and structured intent landed in the CRM as designed, with a queue for retries. Silent CRM write failures are the most expensive form of broken dashboard, because they look fine until finance reconciles.
- Recording retention proof — for IRDAI and RBI-regulated calls, evidence that the recording is stored for the mandated period in the mandated geography, with an integrity hash.
If the vendor cannot show you layers three and four in the sales process, assume they do not exist.
What goes wrong: the seven failure modes
These are the patterns we see when an operator inherits a vendor dashboard and tries to actually run a campaign from it.
The connect-rate definition trap. The vendor's connect rate counts SIP 200s, which includes voicemail systems answering. The telephony partner's CDR counts something different. Finance gets a third number from the BPO's MIS. Three numbers, no source of truth. Fix: define connect at the buyer's preferred SIP event, and make the vendor join to the telephony CDR daily.
The "AI handled" inflation. Vendors love a tile that says "AI handled 78%." Drill into it and you find it counts calls where the AI did not transfer, including the ones where the human just hung up in the first ten seconds. Fix: replace "AI handled" with "completion rate to terminal state" and "intent captured," each independently defined.
Blended WER. A single WER number hides the Patna-Hindi disaster. Fix: per-language, per-circle WER reports, with sample size and a confidence band. If the vendor cannot do this, they cannot diagnose their own model.
No drop-off-by-utterance view. Without this, every script change is a guess. Most dashboards show "average duration" and call it done. Fix: utterance-indexed drop-off heatmap. Even a CSV export of (call_id, utterance_index, hangup_flag) is enough to build it.
Hot transfer SLA buried in a sub-screen. When a call is escalated, the customer is on hold and irritated. If the vendor surfaces transfer SLA only in a weekly export, you discover the breach after the customer has churned. Fix: live tile, P95 transfer wait in current 15-minute window.
Cost reconciliation lag. The vendor's bill arrives on the 7th. The telephony bill arrives on the 12th. The finance reconciliation happens on the 20th. By then any anomaly is three weeks old. Fix: daily cost tile, broken out by voice AI usage and telephony minutes, with the prior-day variance flagged.
Compliance evidence on demand. A regulator asks for proof of consent on a specific call from 11 weeks ago. The dashboard cannot answer. Fix: a per-call evidence pack — call ID, recording, DLT header, consent timestamp, scrubbing log, retention proof — exportable in under thirty seconds.
The numbers: what good looks like in 2026
These are operator-grade ranges across Indian deployments we have seen in NBFC, insurance, D2C, and BPO contexts. They are not best-case demo numbers.
| Metric | Realistic range | "Vendor demo" range you should distrust |
|---|---|---|
| Connect rate (outbound, fresh base) | 42–58% | "80%+ connect rate" |
| Connect rate (aged base, > 60 days) | 18–30% | "60% on aged data" |
| Completion to terminal state | 55–72% of connected | "95% completion" |
| Intent capture % (of completed) | 70–86% | "100% intent capture" |
| RPC rate, cumulative across 3 attempts | 38–55% | "85% RPC" |
| WER, Delhi Hindi (clean audio) | 6–10% | "Under 5%" |
| WER, Patna/Bhojpuri Hindi | 14–26% | Vendor refuses to break out |
| P50 bot latency | 400–700ms | "Sub-300ms always" |
| P95 bot latency | 900–1,500ms | Not reported |
| DLT header pass rate | 98.5–99.9% | "100%" — usually means not measured |
| Consent confirmation rate (financial) | 78–92% | "Always 100%" |
| Hot transfer SLA, P95 | 8–20 seconds | Not measured |
| Cost per recovered call (NBFC collections, ₹5k–₹50k buckets) | ₹6–₹14 per rupee recovered, depending on DPD bucket and BIN | "₹1 per ₹100 recovered" |
The connect-rate gap between the Hindi belt and South India is real and persistent. North India tier-2/3 borrowers pick up later in the day (most answered calls 11:30am–1pm and 6–8pm IST), and number churn is higher; South India circles see higher first-attempt connect but lower retry lift. Your dashboard should let you slice this by circle and time-of-day in two clicks. If it cannot, you cannot optimize the call window — which is the single highest-leverage knob on outbound (covered in detail in A/B testing voice AI campaigns).
For collections specifically, the dashboard must support BIN-bucket reporting: cost-per-rupee-recovered, PTP rate, and PTP-kept rate, split by ticket size (₹1k–₹5k, ₹5k–₹25k, ₹25k–₹1L, above ₹1L) and by DPD bucket (1–30, 31–60, 61–90, 90+). A vendor dashboard that reports collections performance without those splits cannot tell you which segment is paying for the program and which is bleeding it.
Vendor / build / buy framing
Almost no Indian operator should build a voice AI analytics dashboard from scratch in 2026. The tooling is mature, the integrations are non-trivial, and the regulatory deltas are not where you want your engineering team to spend cycles. The real choice is between (a) accepting a vendor's default dashboard, (b) demanding raw event streams and building the operator console on your existing BI stack (Metabase, Looker, Power BI, Superset), or (c) hybrid — vendor surfaces the live ops tiles, you build the reconciliation and finance views internally.
The right answer for a 200–2,000 seat shop is almost always (c). The vendor's live console runs the daily ops; your BI team owns the cost and compliance views, joined to your CRM and the telephony CDR. For that to work, the contract must include raw event export, not just a UI.
What vendors show vs what you should demand
| What vendors show by default | What you should demand in writing |
|---|---|
| Total calls, connected calls, AI handled | Per-attempt funnel: dialed → connected → RPC → completed → intent captured → CRM written |
| Blended completion % | Per-language, per-circle, per-BIN-bucket completion |
| "Average duration" | AHT defined from first ring to disposition, plus silence and post-call time as separate |
| Sentiment as a single number | Sentiment per utterance, with timestamps and exportable CSV |
| "AI handled X%" | Hot transfer rate, transfer reason, and transfer SLA P50/P95 |
| Aggregate WER (often hidden) | WER per language cohort and per circle, with sample size |
| "Latency" or no latency | P50, P95, P99 end-to-end response latency per locale |
| Vague compliance assurance | DLT header pass rate, consent confirmation rate, per-call evidence pack export |
| Monthly cost line | Daily cost tile, telephony vs AI split, prior-day variance |
| No CDR reconciliation | Daily join to telephony partner's CDR with variance flags |
| Dashboard only in vendor UI | Raw event stream (webhook, S3 drop, or BigQuery share) included in the contract |
If a vendor cannot agree to the right-hand column in writing, the rest of their pitch is decoration. This is the section of your RFP that decides whether you buy a console or a slide. The full vendor-selection rubric is laid out in voice AI vendor RFP scoring rubric for India 2026, which pairs with this post — analytics is one of seven scoring categories there.
Compliance and regulatory considerations
Three regulators shape what a voice AI dashboard in India must surface.
TRAI is the one most buyers know. DLT registration governs every promotional and transactional template you send via SMS or voice. For voice AI specifically, you need the registered header to play, the entity to be whitelisted, and DND scrubbing to happen at dial time. The dashboard surface is: DLT header pass rate per template, scrubbing log per dialed number with timestamp and decision, and a per-template approval status that can be exported during a TRAI audit. The DND ecosystem moved to a blockchain-anchored consent registry in late 2024; your vendor needs to talk to it and log the result. (See TRAI's Commercial Communications Customer Preference Regulations, 2018, and subsequent amendments, on trai.gov.in.)
RBI matters for any voice AI in NBFC or banking collections. The Fair Practices Code requires that recovery calls happen between 8am and 7pm, that the borrower's dignity is preserved, and that all interactions are documented. The dashboard implication: call-time-window compliance per call, escalation reason logging when the bot couldn't handle abuse or distress, and recording retention. The Digital Lending Guidelines (2022, updated 2024) further require that the borrower can request a transcript of any automated call. If your dashboard cannot produce that transcript in under a minute, you are not ready for the regulator.
IRDAI affects insurance sales and renewal voice AI. The recording mandate for sales calls is non-negotiable: every sales conversation must be recorded, retained, and produced on demand. For renewal nudges, the consent disclosure at call start must be captured as a specific event. The dashboard needs disclosure-rate-per-call as a first-class metric, not a derived report.
DPDP 2023 sits over all three. Purpose-bound consent must be logged per dialed campaign, retention windows must be enforced automatically, and a data principal rights request (e.g. erasure) must be operable from the dashboard. A vendor whose "compliance" view is a static PDF generated monthly cannot meet DPDP timelines.
The implementation playbook
Getting from "the dashboard is coming next sprint" to a console your MIS team trusts takes 8–10 weeks. The shape is the same whether you are an NBFC, an insurer, or a BPO.
- Week 1 — Inventory. List every metric you actually need to run the program. Sit with collections / sales / ops leads and have them rank: must-have-daily, must-have-weekly, nice-to-have. Most lists land at 25–35 metrics. Cull to the 12 that genuinely change a decision.
- Week 2 — Define each metric precisely. Write the SQL-equivalent definition: numerator, denominator, time window, segmentation. "Connect rate" is not a definition; "answered SIP 200 within 25 seconds divided by distinct first-attempt dials per day per circle" is. This document becomes the RFP appendix.
- Weeks 3–4 — Vendor interrogation. Walk each shortlisted vendor through the metric list. Ask them to show, live, where each one is in their dashboard. The ones they cannot show, ask for the raw event so you can compute it. Score the gap. This is also covered in the voice AI vendor RFP scoring rubric.
- Week 5 — Pilot wiring. During pilot, wire the vendor's webhook or event stream into your warehouse the same week the pilot starts. Do not wait for "production" to integrate analytics. The pilot is where you find the gaps. The 30-day pilot structure is in the voice AI pilot 30-day playbook.
- Weeks 6–7 — Reconciliation builds. Stand up the CDR join with your telephony partner (telephony integration) and the CRM write-back integrity report against your CRM (CRM integration). These two views, more than any vendor UI, prove the system is honest.
- Week 8 — Compliance pack. Build the per-call evidence export. Test it end-to-end with a fake regulator request: pick 5 random calls, produce the pack in under 30 seconds each. If you cannot, fix the gap before you scale.
- Weeks 9–10 — Finance handshake. Show the daily cost tile to your CFO. Get sign-off on the unit-economic definition. The dashboard is not done until finance trusts the cost-per-recovered-call number.
The mistake to avoid: treating analytics as a post-launch deliverable. Every operator we have seen run that play has rebuilt the dashboard within six months under regulator or finance pressure.
What changes in the next 12 months
Three shifts are worth pricing into procurement now.
First, real-time conversational quality scoring will move from quarterly QA sampling to live tiles. Vendors are already shipping live "agent quality" scores per call based on script adherence and sentiment trajectory; expect that to become a default expectation by end of 2026.
Second, regulators will keep tightening the audit-evidence surface. The DPDP rules' consent manager framework and the RBI's push toward digital-lending transcript-on-demand both raise the bar on per-call exportability. Buy a platform that already does this, not one promising to ship it.
Third, AI Overviews and answer-engine surfaces are starting to cite compliance and operating numbers from public buyer guides. Vendors who publish honest dashboards and metric definitions will win citation share; vendors who don't will look opaque to procurement teams who now research with LLMs before scheduling demos.
Bottom line
A voice AI analytics dashboard is the contract between the vendor's claims and the operator's reality. Most dashboards in market in 2026 are still demo-grade: pretty tiles, blended numbers, no joins to CDR, no per-language WER, no per-call evidence. An Indian VP of Ops cannot run a program from that, and finance cannot defend it. The buyer's job is to put the metric list, the definitions, the reconciliation requirements, and the raw event export into the RFP — not to hope the vendor delivers them later. Done right, the dashboard stops being a slide that arrives next sprint and becomes the console you actually run the business from. Done wrong, your MIS lead is still in Excel two quarters in.
For broader operator playbooks across regulated verticals where dashboards are doing the heavy lifting, see our work with NBFC voice AI deployments and insurance voice AI.
Frequently Asked Questions
Tags :









