India’s AI systems must steer clear of a Silicon Valley mindset, said Apurv Agrawal, CEO of SquadStack, in an exclusive interview with Outlook Business. Agrawal emphasised that AI in India needs to be context-aware, multilingual, cost-efficient, and empathetic to the country’s complex human dynamics.
“India is uniquely positioned to lead the world in agentic AI not just as a consumer market, but as a builder ecosystem. Our linguistic diversity, customer scale, and affordability constraints force us to solve real-world AI problems at extreme edges,” he stated.
How does SquadStack define agentic AI, and what are the core differences, both in capabilities and outcomes, between a traditional voice bot or IVR and your “Humanoid AI Agent”?
Agentic AI, as we define it at SquadStack, is a system that doesn’t just follow instructions, it takes ownership of outcomes. It can independently perceive, decide, act, and improve across a workflow, and not just execute scripted commands.
Most traditional voice bots and IVRs follow static logic trees. They’re linear, brittle, and largely reactive. If the conversation veers off-script, the system breaks down or escalates.
In contrast, our Humanoid AI Agent is designed to function more like a skilled teammate. It can hold multi-turn conversations, adapt to live inputs, pick up on user intent even when phrased imperfectly, and take actions based on context. It’s trained on real-world data, including behavioral nuances and intent patterns, which makes it capable of managing ambiguity, not just answering FAQs.
From a capabilities standpoint, the shift is that from a rule-following assistant to an outcome-driving agent, which is what defines agentic AI for us.
For example, a voice bot might remind a customer of a due payment. But our agent can not only explain the implications, offer resolution options, and handle rebuttals in natural language, it also knows when to pause, escalate, or re-engage later based on the user’s intent, tone, or even hesitation. This agent acts with autonomy within guardrails, continuously learning from feedback and aiming for resolution, not just interaction.
SquadStack’s AI-Verse mentions training on over 10 million minutes of human interactions. Could you walk us through the end-to-end technology stack- data ingestion, model training, inference, and deployment behind your Humanoid AI Agent? Which open‐source or proprietary components, architectures (transformer‐based LLMs, speech‐to‐text / text‐to‐speech pipelines) and cloud/edge infrastructure do you leverage?
What we’ve built isn’t a repackaged voice bot, it’s a layered system where every component, from ingestion to inference, is designed to handle the messiness of real-world conversations at scale. Our technology stack is built for production-grade scale, not just demos.
We start with data ingestion, where over 10 million minutes of real human conversations are collected, transcribed, and annotated. This includes intent, sentiment, objection handling, and even acoustic cues like hesitation or urgency. Our labeling pipeline uses both human-in-the-loop workflows and semi-supervised learning to refine annotations continuously.
For speech processing, we use a modular pipeline:
● Speech-to-text (STT): We’ve evaluated multiple STT engines for Indian accents and dialects, ultimately optimizing for latency and accuracy using customized models fine-tuned on Indian phonetic variations.
● Natural language understanding (NLU): This is powered by transformer-based models, some are open-source LLMs, others are proprietary, fine-tuned on our domain-specific datasets. Our internal orchestration layer routes inputs to the most suitable model variant based on use case complexity, language, and call stage.
● Decision engine: Here, we combine retrieval-augmented generation (RAG) techniques with business logic and real-time context to drive next-best actions. This allows our agents to handle dynamic scenarios like payment reminders, eligibility checks, or escalation triaging.
● Text-to-speech (TTS): We use ultra-low-latency TTS systems capable of injecting emotion, emphasis, and pauses to make speech sound fluid and natural. We’ve partnered with select providers but have built proprietary layering on top to manage tone calibration and voice persona consistency.
● Inference and orchestration: All components are containerized and run on an auto-scaling Kubernetes infrastructure on the cloud. For latency-sensitive use cases, we’re increasingly moving inference to edge nodes closer to telecom gateways to reduce round-trip times.
Deployment-wise, we treat each conversation as a flow of intents rather than a script. That requires real-time observability, fail-safes, and continuous retraining pipelines. Our CI/CD setup ensures new learnings from calls flow back into the model training cycle with minimal lag.
What makes this stack effective isn’t just the technology, but the way it’s been tuned for India's reality of low-bandwidth environments, high linguistic diversity, and complex regulatory constraints. We didn’t build a general-purpose assistant. We built a telecaller that can scale with accountability.
How do you design feedback loops to ensure that the Humanoid Agent learns from both successful and failed interactions? What human‐in‐the‐loop checkpoints or gating mechanisms exist to prevent drift or the propagation of sub‐optimal behaviors?
Our feedback loops are core to what makes the system "agentic." Each call is analyzed in three dimensions: intent resolution, customer emotion, and compliance outcome.
We’ve built real-time call audits using proprietary evaluators as well as LLM-assisted scoring. Successful resolutions (e.g., payment made, lead converted) feed positive reinforcement into the intent-response mapping. Failed or incomplete outcomes trigger a supervised review pipeline.
Here’s where the human-in-the-loop (HITL) comes in. Our quality experts, many of whom are ex-BPO trainers, annotate high-risk or novel conversations. These are fed back to retrain intent classifiers and refine response selection models.
We also employ gating layers: before any new response path or change in logic is deployed, it’s tested across sandbox simulations and A/B-tested live with synthetic customers to avoid model drift or hallucinations creeping into production.
Our agents improve every week. not just by learning new tasks, but by unlearning what didn’t work.
In India, regulations around personal data and voice recordings (e.g., PDPB in progress, sectoral guidelines by RBI, IRDAI, and Telecommunication laws) are still evolving. How does your start-up ensure compliance with current data‐privacy norms, especially when handling sensitive customer information across sectors like BFSI and healthcare?
We follow a privacy-by-design approach.
First, we segregate PII (personally identifiable information) from conversational metadata at the ingestion layer. Encryption at rest and in transit is non-negotiable. We’re fully aligned with India's evolving data privacy frameworks, including anticipated PDPB provisions.
For BFSI clients, we adhere to RBI’s cloud outsourcing and data localization guidelines. Similarly, for insurance or health clients, we ensure data residency, anonymization, and role-based access across our tools.
Critically, no model is trained on raw customer data. Our training datasets are de-identified and aggregated. All human audits are done in secure VDI environments, and call recordings are only retained as per the client’s compliance window.
We’ve invested early in these safeguards because trust isn’t a feature, it’s the foundation for us.
India’s linguistic diversity presents both opportunity and complexity. Your Humanoid AI Agent supports “10+ Indian languages.” How do you handle dialectal variations, code‐mixing (Hindi‐English or Tamil‐English), and cultural nuances (honorifics, regional idioms) to ensure each conversation feels “human-like” and contextually appropriate?
This is one of our proudest achievements. India isn’t just multilingual, it’s multi-dialectal, emotionally nuanced, and code-mixed by default.
We use three strategies:
Dialect Adaptation Models: Our STT engines are fine-tuned on regional audio datasets, from Bhojpuri-accented Hindi to Kongu Tamil, to ensure accurate recognition.
Code-Mix Handling: We treat Hinglish, Tanglish, etc., as native inputs. Our models are trained on millions of naturally occurring code-mixed sentences, making transitions between languages smooth.
Cultural Memory: We embed region-specific etiquette, honorifics (e.g., “ji”, “amma”), and even local metaphors into response generation. This is governed by the customer profile and geography.
In short, the agent doesn’t just speak the language, it speaks the context.
Demonstrating “unmatched scalability” is one thing; actually scaling from handling hundreds to hundreds of thousands of calls is another. What are the biggest architectural or operational bottlenecks you’ve faced in scaling the Humanoid Agent be it latency constraints, model serving costs, or network reliability and how have you overcome them?
The biggest challenge has been balancing response latency, inference cost, and voice quality at scale.
Running LLMs in real time for 100k+ concurrent calls is a cost and latency risk. We solved this through hierarchical orchestration, using fast, lightweight classifiers for known intents and reserving LLMs for ambiguous or novel queries.
Network jitter and call drops especially in Tier-3 regions also posed a challenge. We’ve since implemented regional failovers and elastic routing with telco partners to ensure uptime.
On ops, training agents in 10+ languages while maintaining consistency was non-trivial. We built an internal simulation platform where agents are stress-tested across language, tone, and emotion edge cases before deployment.
SquadStack promotes a “humans + tech” approach. How do you orchestrate seamless handoffs between AI agents and human specialists, especially when a call needs escalation? What training protocols or UI/UX systems do your human agents follow to collaborate effectively with the AI in real time?
We’ve designed this as a tag-team relay, not a bailout.
Our AI agents monitor over 15 escalation triggers in real time, such as escalation intent, customer frustration, unexpected rebuttals, etc. If flagged, the call is routed live to a human agent with the full conversation history, call context, and sentiment markers.
Our human specialists use a co-pilot interface that shows suggested next actions, call objectives, and past interactions. This allows them to continue the conversation naturally, not restart it.
One of SquadStack’s mission pillars has been to democratize access to quality jobs: enabling stay‐at‐home mothers and talent in Tier‐2/3 cities to earn by plugging in remotely. How has AI been instrumental in creating a frictionless hiring, assessment, and training pipeline for these workers, and what challenges do you see in upskilling them for more complex AI‐augmented roles?
AI has helped us reinvent workforce enablement. We’ve built a fully remote, AI-augmented work platform where anyone, from stay-at-home mothers in Indore to graduates in Siliguri, can become a certified telecalling specialist.
Our AI handles assessment, training simulations, and feedback. Every caller gets call-by-call coaching powered by our audit engine.
This has allowed us to scale to thousands of distributed remote agents with near-zero physical infrastructure.
The next frontier is upskilling them to work alongside AI. Reviewing calls, training new flows, even annotating data for future models. Which is not without challenges. Many agents are new to digital interfaces and advanced technologies, so building digital literacy is a foundational step.
By providing personalized learning journeys, adjusting to each agent’s pace and learning style, we accelerate onboarding and make skill-building more accessible. Agents receive real-time feedback powered by AI analytics, helping them improve continuously.
As Agentic AI becomes more autonomous handling sensitive queries or making real‐time decisions, ethical concerns around transparency, explainability, and potential biases arise. How does SquadStack address issues like model interpretability and ensure your solutions adhere to fair, transparent, and ethical AI principles?
When we talk about agentic AI handling sensitive customer interactions and making decisions autonomously, the stakes for transparency, fairness, and ethics are very high. At SquadStack, we recognize that these are not just compliance checkboxes but foundational principles that shape trust in AI.
From a technical standpoint, our approach starts with building explainability directly into the AI system. While many of our models rely on complex transformer-based architectures for natural language understanding and generation, we layer on interpretability methods like attention heatmaps and feature attribution techniques (such as SHAP) to provide insight into why the AI makes a particular recommendation or response. This is critical not just internally but for client-facing reporting, where business users need to understand the AI’s behavior to trust it fully.
Bias mitigation is another cornerstone. India’s linguistic and cultural diversity means training data must be carefully curated and continuously audited to prevent systemic bias, whether it’s related to dialects, gender, or socioeconomic factors. We maintain rigorous data governance practices, continually analyzing performance metrics segmented by demographics and linguistic groups to identify and correct disparities. Our AI models are retrained with updated data sets to reduce bias over time rather than relying on static models.
Importantly, we never let the AI act unchecked. Our platform includes human-in-the-loop mechanisms: any interaction flagged as ambiguous, sensitive, or outside the AI’s confidence threshold is escalated immediately to human agents. This prevents drift into suboptimal or unethical behavior, ensuring that human judgment guides the most critical decisions. Feedback from these human interventions is looped back into model training for continuous improvement.
On the transparency front, we maintain open communication with our clients about the capabilities and limitations of our AI. This includes clear disclosures on when an interaction is AI-driven, what data is collected and stored, and how customer privacy is protected. We align closely with India’s evolving data privacy laws and emerging AI ethics frameworks, embedding those principles into our development and deployment lifecycle.
Finally, ethical AI is a journey, not a one-time implementation. We have dedicated teams monitoring model behavior in production, running periodic audits, and updating policies to address new challenges as they arise. This ongoing vigilance is essential to maintain trust with both our clients and their customers.
Looking ahead 12–18 months, what are SquadStack’s top priorities for advancing agentic AI? Are you planning deeper vertical integrations, expansion into global markets outside India, or research into next‐gen capabilities? How do you envision SquadStack’s role in shaping India’s broader agentic AI narrative?
We’re entering a defining phase in the lifecycle of agentic AI, not just for SquadStack, but for the Indian AI ecosystem as a whole. Over the next 12–18 months, our focus will center on three core priorities: vertical depth, system autonomy, and infrastructure scale.
1. Going deeper, not just broader:
Rather than spreading our technology thin across dozens of sectors, we’d like to double down on high-stakes verticals like financial services, healthcare, and education. These are domains where conversations directly influence decisions like applying for a loan, navigating insurance, choosing a university, and where stakes are high both emotionally and financially. This means investing heavily in domain-specific reasoning layers, regulatory context awareness, and higher-fidelity intent recognition tuned to each use case. You can’t fake “agentic” in these categories, the AI must truly understand, act, and escalate responsibly.
2. From reactive bots to proactive agents:
We’re actively pushing our architecture toward greater proactivity and contextual memory. Most voice AI today reacts to customer queries. Our next-gen agents will initiate instead, reminding borrowers of due payments, nudging users based on behavior, or re-engaging cold leads with timing sensitivity. To do that, we’re evolving our systems to retain long-term conversational memory (while staying privacy-compliant), learn preferences, and adapt decision logic in real time. We’re also experimenting with reinforcement learning-based strategies for autonomous goal completion across multi-turn conversations.
3. Scaling the infrastructure spine:
As we scale from millions to hundred millions of daily conversations, we’re investing significantly in latency-optimized inference pipelines, modular microservices for speech tasks, and auto-scaling orchestration that adjusts based on campaign demand. We’re re-architecting core components of our speech stack - TTS, ASR, NLP - to work in parallel across languages and optimize for regional accents and noise conditions typical in Tier 2/3 geographies. This also lays the foundation for global expansion where voice diversity is equally nuanced.
On shaping India’s agentic AI narrative:
India is uniquely positioned to lead the world in agentic AI not just as a consumer market, but as a builder ecosystem. Our linguistic diversity, customer scale, and affordability constraints force us to solve real-world AI problems at extreme edges. At SquadStack, we want to play a central role in that movement. That means open-sourcing select components of our infrastructure, contributing to India-specific benchmarks for conversational AI, and collaborating with academic and policy institutions to ensure innovation doesn’t outpace guardrails.
AI in India can’t be built with a Silicon Valley mindset. It needs to be context-aware, cost-effective, multilingual, and empathetic to real human needs. That’s the kind of AI we’re building, and the narrative we’re proud to lead.