Voice AI Pricing in India: The 7 Contract Clauses That Decide Whether ₹3/Minute Is Actually Cheaper Than ₹9/Minute

Summary: The ₹3 vs ₹9 per-minute debate is a distraction. The real price of voice AI in India is decided by seven clauses buried in the contract: how a "connected minute" is defined, the inbound/outbound differential, retry and answer-machine handling, setup and integration fees, minimum commits and credit expiration, TTS voice and language surcharges, and CRM write-back and seat fees. This post walks through each clause, shows how a ₹3/min headline rate becomes ₹11.40/min in production, and gives you a 10-question RFP block you can paste straight into your next vendor evaluation.
Every NBFC, hospital chain, and ecommerce brand we have talked to in the last year has the same story. The procurement team runs a voice AI bake-off. Three vendors pitch. One quotes ₹3 per minute, one quotes ₹7, one quotes ₹12. The procurement lead picks the ₹3 vendor on cost, gets them through legal, deploys them — and six months later the CFO asks why the monthly invoice is twice what the ₹7 vendor would have been, for worse outcomes.
The instinct is to blame the salesperson. That is wrong. The salesperson told the truth: the rate is ₹3 per minute. What they did not tell you is that "minute" means something different in their contract than in yours, and that six other clauses in the same contract each add 10–40% to the real monthly spend in ways that do not show up until month three.
This post is not another argument that per-minute pricing is broken. That argument has been made, well, by others — and it is correct but not actionable. What buyers actually need is a contract-clause checklist: a specific list of the seven line items that decide what "₹3 per minute" actually means when the monthly invoice arrives. Use it as a procurement filter, an RFP question block, and a post-deployment audit tool. The numbers below are drawn from real deployments we have seen in Indian BFSI, healthcare, and ecommerce during 2024–2026.
The ₹3 vs ₹9 question is the wrong question
Before the checklist, a brief detour. The industry has spent two years arguing about whether per-minute, per-credit, or outcome-based pricing is the "fair" model. The honest answer is that all three models can be fair or abusive depending on who writes the contract. A ₹3/min vendor with transparent clauses can be cheaper than a ₹9/min vendor with predatory ones, and vice versa. Obsessing over the headline rate is how procurement teams lose the negotiation before it starts.
The more useful question is the one a CFO asks at month six: "what is our actual cost per recovered rupee, per booked appointment, per qualified lead?" That number is almost never the headline rate. It is the headline rate multiplied by a stack of clause-specific inflators — each one small, all of them compounding. In the worked example at the end of this post, a contract with a ₹3/min headline rate produces a ₹11.40/min effective rate, while a ₹9/min headline with clean clauses stays at ₹9.20/min. The ₹3 vendor was genuinely three times more expensive per outcome.
The seven clauses below are where that compounding happens. If you read only one section of this post, read the connected-minute definition — it is the single biggest source of invoice shock in Indian voice AI deployments in 2026.
The 7 contract clauses that decide what you actually pay
Clause 1: The connected-minute definition — "connected" does not mean what you think
Every voice AI contract charges for "connected minutes." Most procurement teams assume this means the minutes during which the borrower or customer is actually speaking to the bot. It almost never does.
Read the contract. You will typically find one of four definitions, each more aggressive than the last:
- Definition A (rare, transparent): billed time starts when the borrower says their first word and ends when the call is terminated. Silence is not billed beyond a defined threshold.
- Definition B (common): billed time starts when the carrier signals answer supervision — meaning the moment the phone is picked up, before anyone has said anything. This typically adds 3–7 seconds per call, which is 5–12% of a 60-second call.
- Definition C (common, aggressive): billed time starts when the call is placed, including ring time. This adds 8–15 seconds per call on Indian telecom — you are paying for ringing, even on calls that are never answered.
- Definition D (predatory, and we have seen it in real Indian contracts): billed time includes carrier-side voicemail detection, answering-machine navigation, silence up to 30 seconds, and the entire hang-up tail. A 45-second human conversation becomes a 75-second billed minute.
The practical test: ask the vendor to show you, in writing, exactly which Unix timestamp the billing clock starts on and which one it stops on. If the answer is vague, the answer is Definition D. A ₹3/min rate under Definition D is effectively ₹4.50–₹5 under Definition A — a 50–67% inflator before any other clause kicks in.
There is a specific sub-trap here: silence billing. Some contracts charge for the entire call duration even when the bot is silent because the borrower is reading a document, talking to someone else in the room, or thinking. Borrowers in tier-2 and tier-3 India are slower to respond than the US-trained latency assumptions bake in, which means silence billing hits Indian NBFC deployments harder than US ones. Ask: "is silence billed, and if so above what threshold and at what rate?"
Clause 2: The inbound/outbound differential
Most voice AI platforms have very different cost structures for inbound and outbound traffic, and the contract often obscures the difference by quoting a blended rate. But every enterprise deployment we have seen is at least 70% outbound, because that is where the collections, reminders, and outreach workflows live.
The gotcha: the blended rate is calculated on a typical mix the vendor has across all customers, which skews toward 50/50. Your real mix is 80/20 outbound, and outbound is where the per-minute cost is higher — because outbound calls carry the telephony termination fee, the DLT-registered template delivery fee, and often a dialler-per-minute premium that inbound does not.
The practical test: ask for the unblended inbound and outbound rates separately and compute your own weighted average based on your real mix. A ₹3 blended that is actually ₹2 inbound and ₹4 outbound becomes ₹3.60 per real-mix minute — a 20% inflator even before the connected-minute definition hits.
Clause 3: Retries, redials, and the answer-machine penalty
In Indian collections and reminder workflows, first-call answer rates rarely exceed 35–45%. That means for every borrower you actually reach, the dialler placed 2–3 calls. The question is: who pays for the unanswered attempts?
Three common contract patterns:
- Pattern 1 (rare, transparent): unanswered attempts are free. You pay only for connected calls.
- Pattern 2 (common): the first two unanswered attempts per borrower per day are free, subsequent retries are billed at a reduced rate (typically 25–40% of the connected rate).
- Pattern 3 (aggressive): every attempt — answered or not — is billed as a full connected minute once it reaches carrier-side ring supervision. This is almost always the case under the Definition C or D connected-minute rule.
Combined with a typical 35% answer rate, Pattern 3 effectively triples the real cost per connected conversation. A ₹3/min vendor under Pattern 3 is paying for three dials to get to one conversation — bringing the effective rate per conversation to ₹9/min without the bot ever saying a word differently.
And there is a subtler trap: answer-machine navigation. When a call hits a voicemail, some vendors' diallers attempt to navigate the prompts to leave a message. This can add 20–45 seconds per voicemail hit, all billed. Ask: "what does the dialler do when it detects an answering machine, and is that time billed?"
Clause 4: Setup, onboarding, and integration fees
Headline per-minute rates never include the one-time costs, and the one-time costs are where vendors concentrate margin they could not get into the monthly run rate. In Indian enterprise deployments we have seen:
- Platform setup fees: ₹1–5 lakh, often framed as a "configuration charge" or "tenant provisioning fee."
- Voice model fine-tuning: ₹2–8 lakh, framed as "language customisation" or "vertical training."
- CRM integration: ₹3–10 lakh per CRM connector, even for standard connectors to Salesforce, Zoho, Kapture, or LeadSquared that the vendor has built once and resells.
- Telephony provisioning and DLT template registration: ₹25,000–1 lakh per template, plus ongoing template-change fees.
- Go-live support: 4–12 weeks of "launch engineering," billed at professional-services rates that are often ₹15,000–25,000 per day.
Total first-year one-time cost: typically ₹10–25 lakh for a mid-sized NBFC deployment. Amortised over 12 months of typical volume, that adds ₹0.80–₹2.10 per minute to the effective rate. A ₹3 headline becomes ₹4.50–₹5.10 purely from one-time amortisation.
The practical test: demand a total-cost-of-deployment quote, not just a per-minute quote. Ask for every one-time fee in a single line-itemised document, and ask whether any of them can be waived or amortised into the monthly rate.
Clause 5: Minimum commits, rollover, and credit expiration
The moment a contract includes a minimum monthly commit, the per-minute rate stops being the real rate. What you are actually paying is whichever is higher of (a) your usage × rate, or (b) the commit. In seasonal businesses like Indian ecommerce, where October–December volumes are 3–5× January–March, the commit almost always binds in the low months — meaning you are paying for minutes you never used.
Three flavours to watch:
- Hard commit, no rollover: unused minutes vanish at month end. Common in smaller vendors. A ₹3/min rate on a 100,000-minute monthly commit, used at 60% in March, is effectively ₹5/min for that month's real traffic.
- Credit expiration: you pre-pay for a pool of minutes (or "credits") that expire after 6–12 months. Any vendor quoting a "₹2/min credit bundle" almost always has this clause, and the real cost depends on how many credits you lose at the expiration date.
- Committed growth rate: the contract specifies a volume growth curve you must hit, with penalties if you fall below. This is mostly seen in very large enterprise deals, but it has started appearing in mid-market Indian contracts as vendors try to hedge against flat-line customers.
The practical test: demand quarterly true-ups rather than monthly, demand rollover of unused minutes for at least one quarter, and get the expiration clause deleted or extended to at least 18 months. These three changes alone can reduce the effective rate by 15–25% for a typical mid-market deployment.
Clause 6: TTS voice premium and language surcharges
Hindi voice output is not cheap — and good Hindi voice output is priced like it. Most voice AI platforms offer a baseline English TTS voice at the headline rate, then charge progressively more for regional languages, premium voices, and code-switching capability.
Typical surcharges we have seen in Indian contracts in 2026:
- Hindi (baseline voice): ₹0.50–₹1.50 per minute over headline.
- Hindi (premium voice with prosody): ₹1.50–₹3.00 over headline.
- Tamil, Telugu, Marathi, Bengali, Kannada, Gujarati, Punjabi, Malayalam: ₹1.00–₹2.50 over headline each.
- Code-switching (Hindi-English, Tamil-English): ₹1.00–₹2.00 additional over the regional premium.
- Custom brand voice: ₹5–15 lakh one-time plus ₹1.00–₹2.50 ongoing per minute.
The aggregation effect is brutal. An NBFC running collection calls in Hindi, Tamil, and Telugu with code-switching — a normal pan-India deployment — can find itself paying a regional-language premium on 85% of its real traffic, which makes the headline English rate almost irrelevant. A ₹3/min headline with a ₹2 regional premium is a ₹5 effective rate on 85% of traffic, or ₹4.70 weighted.
The practical test: ask the vendor what their language mix assumption is when they quote the blended rate, and recompute using your real mix. Any vendor who quotes ₹3/min without asking what languages you need is quoting you a rate they never expect to bill at.
Clause 7: CRM write-back, seat fees, API quotas, and dashboard access
The final category is the "platform fees around the rate" — the things that are not per-minute but show up on the invoice anyway. Watch for:
- CRM write-back API calls: some vendors charge per API call for writing call outcomes, transcripts, and borrower-state updates back into your CRM. A 60-second call that produces 8 CRM writes at ₹0.50 per write adds ₹4 to every minute. This is the single most surprising line item on the first invoice because the sales conversation did not mention it.
- Dashboard seats: per-user monthly fees for access to the reporting and campaign dashboard. Common quotes: ₹2,000–5,000 per seat per month. An NBFC collections team of 40 users adds ₹80,000–₹2,00,000 per month to the real cost.
- API rate limits and burst pricing: some vendors cap your API throughput at a level that forces you to buy "burst capacity" during campaign launches.
- Transcript storage and retention: long-term storage of call recordings and transcripts can be billed separately at ₹0.50–₹2.00 per minute per month of retention — which, for a regulatory-required 7-year retention in BFSI, can be 80× the original call cost over the retention period.
- Sandbox and staging environments: often billed as additional tenants at a fraction of production pricing, but still meaningful.
The practical test: demand a flat platform fee that bundles dashboards, API calls, transcript storage for the full regulatory retention period, and sandbox access — or explicitly break all of these out line-by-line in the RFP so you can compare like-for-like across vendors.
Worked example: the ₹3 contract that becomes ₹11.40
Let us make this concrete. Vendor A quotes ₹3/min blended. Vendor B quotes ₹9/min blended. Your workload: 300,000 connected minutes per month, 80% outbound, 85% in Hindi with some Tamil and Telugu, 40% call answer rate, with a standard CRM write-back requirement and a 40-seat dashboard team.
Vendor A at a ₹3 headline, with every clause stacked against you:
| Clause | Effect | Effective rate |
|---|---|---|
| Connected-minute Definition C | +30% | ₹3.90 |
| Outbound-heavy mix vs blended | +20% | ₹4.68 |
| Retry billing (Pattern 3, 40% answer) | +100% over recorded conversations | ₹9.36 |
| Setup + integration amortisation | +₹1.20/min | ₹10.56 |
| Hindi + regional language premium | +₹0.80/min weighted | ₹11.36 |
| CRM write-back at ₹0.50 × 8 calls/min | ~₹0.04/min equivalent | ~₹11.40 |
Effective rate: ₹11.40/minute. Monthly spend on 300,000 connected minutes: ~₹34.2 lakh.
Vendor B at a ₹9 headline with clean clauses:
| Clause | Effect | Effective rate |
|---|---|---|
| Connected-minute Definition A (transparent) | 0% | ₹9.00 |
| Unblended outbound rate at ₹9 | 0% | ₹9.00 |
| Retries free up to 3/day | ~0% | ₹9.00 |
| Setup included in first-year ramp | 0% | ₹9.00 |
| Hindi included at headline | 0% | ₹9.00 |
| Bundled platform fee of ₹60k/month | +₹0.20/min on 300k minutes | ₹9.20 |
Effective rate: ₹9.20/minute. Monthly spend: ~₹27.6 lakh.
The ₹3 vendor is ~24% more expensive in rupees per month and, because their answer-to-connected inflator is structural, far more expensive per actually recovered outcome. This is not a hypothetical — it is within 10% of three real deployments we have seen migrated off "cheap" vendors in the last 14 months.
The RFP question block: paste this into your next vendor evaluation
Send this verbatim to every voice AI vendor in your shortlist and demand written answers before the first demo:
- Define "connected minute" with the exact Unix-timestamp trigger for start and stop. Is silence billed? Above what threshold?
- Are inbound and outbound minutes billed at the same rate? If blended, what mix is the blend based on, and what are the unblended rates?
- What is the billing treatment of unanswered dials, voicemail detections, and answer-machine navigation? Show the specific clause.
- Itemise every one-time fee — platform setup, model fine-tuning, per-CRM integration, DLT registration, launch engineering days. Include total-cost-of-deployment, not just per-minute.
- What is the minimum monthly commit, the rollover policy, and the credit expiration term? Can unused minutes roll over for at least one quarter?
- What are the per-language surcharges for Hindi and each regional language, and what is the surcharge for code-switching? Quote against our language mix, not your blend.
- Are CRM write-back API calls, dashboard seats, transcript storage, and sandbox access included in the headline rate, or billed separately? Quote each line item.
- What is the regulatory retention period for call recordings and transcripts, and what is the storage cost over that period?
- Will you accept a total-cost-of-ownership cap in the contract that includes every clause above?
- Will you provide a worked example of a real customer's first 90 days of invoicing, with every line item, redacted to anonymise the customer?
Any vendor who refuses to answer questions 1, 4, 7, and 10 in writing is disqualified. Any vendor who accepts a TCO cap in question 9 should move to the top of your shortlist — that is the only clean signal of alignment between their pricing and your real cost.
The negotiation move that beats the headline-rate framing
The single most effective thing you can do in a voice AI RFP in India in 2026 is to refuse to negotiate on per-minute rate. Instead, demand a per-successful-outcome price cap — priced in rupees per booked appointment, per promise-to-pay captured, per qualified lead, depending on your workflow. This does three things at once:
- It aligns the vendor's incentive with yours. They make more money only when you do.
- It exposes the real cost structure, because any vendor who cannot commit to an outcome price knows their headline rate is hiding inflators.
- It makes comparison across vendors trivial. You are no longer comparing ₹3 vs ₹9 — you are comparing ₹150 vs ₹180 per booked appointment, which is a number your CFO can actually use.
The objection you will hear: "we cannot commit to outcome pricing because outcomes depend on your data quality, your list, your brand." That objection is fair. The counter-move is a shared-risk contract — a floor per-minute rate plus an outcome bonus, with a cap on total spend. Most mid-market vendors in India will agree to this if pushed, and the ones who will not are the ones whose economics do not work at the outcomes they are pitching.
Where Caller Digital fits
We built Caller Digital's commercial model to survive this checklist. That means Definition A connected-minute billing with silence thresholds explicitly capped, inbound and outbound rates listed separately in every contract, retry billing at zero cost for unanswered dials up to a defined threshold, Hindi and regional language voices included at the headline rate for most verticals, CRM write-back bundled into a flat platform fee, transcript retention priced per real-world regulatory requirement rather than per-minute-per-month, and a willingness to commit to outcome-linked pricing for deployments where we can validate list quality.
We are not the cheapest headline rate in the Indian voice AI market, and we do not try to be. We are the vendor whose total-cost-of-ownership over 12 months is the lowest for buyers who actually run the clause-by-clause arithmetic. If the arithmetic favours someone else for your specific workload, we will tell you — because an NBFC that deploys us on the wrong use case at the wrong volume is an NBFC that will churn in month eight, which is worse for everyone.
If you are evaluating voice AI and want to pressure-test your shortlist against this checklist, the fastest path is to book a free custom demo. We will walk through each of the seven clauses live, share our standard contract template with every clause highlighted, and give you a TCO calculator you can use against every other vendor in your RFP.
For deeper reading, see our RBI and DPDP compliance checklist, our DPD-bucket playbook for NBFC collections, and our Voice AI vs IVR for Indian Banks: A ₹47 Lakh/Year Decision. For a quick numerical sanity-check on your own workload, plug real numbers into the EMI Collections ROI Calculator.
The bottom line
The ₹3 vs ₹9 argument is a distraction. Every voice AI contract in India has seven clauses that decide what the headline rate actually means in production, and in most cases the cheaper-looking vendor is more expensive by month three. Do not negotiate on per-minute rate alone. Demand Definition A connected-minute billing, unblended inbound/outbound rates, free retries, itemised one-time fees, quarterly true-ups, language-inclusive headlines, bundled platform fees, and — if you can get it — an outcome-linked price cap. The vendors who accept are the ones whose economics actually work. The vendors who refuse are showing you, for free, why their headline rate was too good to be true.
Frequently Asked Questions

With a strong background in content writing, brand communication, and digital storytelling, I help businesses build their voice and connect meaningfully with their audience. Over the years, I’ve worked with healthcare, marketing, IT and research-driven organizations — delivering SEO-friendly blogs, web pages, and campaigns that align with business goals and audience intent. My expertise lies in turning insights into engaging narratives — whether it’s for a brand launch, a website revamp, or a social media strategy. I write to build trust, tell stories, and make brands stand out in the digital space. When not writing, you’ll find me exploring data analytics tools, learning about consumer behavior, and brainstorming creative ideas that bridge the gap between content and conversion.
