Columns

Why India Must Build Its Own Language Models

Contrary to what seems to be a widespread perception, building impactful LLMs today is not about throwing endless resources at the problem. It’s about innovation, and in which, we believe, India is uniquely poised to lead

Freepik
Photo: Freepik
info_icon

That ‘India should not invest in building LLMs [large language models]’ is a statement making the rounds. Most recently, it was championed by Nandan Nilekani. While we deeply admire Nandan’s unparalleled contributions to shaping India’s software and digital infrastructure, we fundamentally disagree. This argument against India developing its own LLMs is rooted in misconceptions, about what building LLMs entails as well as what it can achieve.  

Contrary to what seems to be a widespread perception, building impactful LLMs today is not about throwing endless resources at the problem. It’s about innovation, and in which, we believe, India is uniquely poised to lead. 

To understand LLMs like ChatGPT and their ability to “converse as intelligently as a human”, it is important to begin with what an LLM actually is and how it works. At its core, an LLM performs one primary task: given a partial sentence, it predicts the next word. 

For example, when you type, “The sun is a…,” an LLM may predict “huge” or “star” because it has learned patterns from billions of sentences. It then takes the word it just generated, feeds it back into itself as part of the next prompt, and continues predicting. Internally, the sentence evolves into, “The sun is a huge…,” where the model might then predict “ball” and continue this process until it generates a coherent phrase like, “The sun is a huge ball of fire.” 

This step-by-step process is called auto-regression, and while the algorithm seems deceptively simple, the real challenge lies in achieving high-quality predictions. 

To accomplish this, three key components come into play: 

  • Accuracy of the next word prediction, which relies on advanced algorithms like Transformers and all too well known ‘attention mechanism’, which allows the model to focus on the most relevant words in a sequence. 

  • Training on vast amounts of high-quality data to recognise patterns and relationships between words. 

  • Computational power, as training these models requires immense processing resources to handle billions of calculations across massive datasets. 

In essence, while the task of predicting the next word is straightforward, the scale, sophistication and precision required to generate meaningful and contextually accurate text is what makes LLMs so impressive. 

While the initial efforts and the first release of models like ChatGPT did indeed require vast computational resources, this has led to a common misconception: that building any reasonable language model must necessarily demand enormous resources. This is no longer true, especially with the advancements available today. 

The best models are not necessarily the largest; they are the ones that are more efficient and more focused, achieving the same (or better) performance with fewer resources. Let us break down the misconceptions that persist and explore why building India’s own LLMs isn’t just feasible, it is essential. 

Myth 1: ‘You Need Unlimited Compute to Build LLMs’ 

This is the most common misconception. It’s easy to assume that building LLMs requires massive data centres and an astronomical budget. But continued innovation in deep learning architecture as well as mathematical optimisation have dramatically reduced the resources needed. 

The primary reason for the computational capacity of LLMs lies in their foundation: deep learning. These models process complex operations, such as attention mechanisms—which are critical for accuracy—across multiple hidden units and layers. Additionally, there is the inherent challenge of solving the ever-present optimisation problem, which involves searching for the best (or near-best) solution within an exponentially large space. 

Let us take examples of ongoing innovation towards mitigating such complexities. As one example, work on sparse attention mechanisms (Shen et al., 2023) addresses the inefficiencies of traditional Transformers, where every word in a sequence is compared to every other word, resulting in significant computational cost. 

Imagine reading an entire book word for word to locate a single paragraph of interest—this is the challenge of dense attention. Sparse attention techniques solve this problem by focusing only on the most relevant words, reducing the computational complexity by an order of magnitude. 

On the optimisation front, consider smarter algorithms such as the Lion optimiser, introduced by Chen et al. (2023). Lion uses a simpler and more efficient momentum mechanism compared to Adam—momentum being a well-established paradigm in gradient optimisation and Adam a widely-adopted technique. Lion achieves faster convergence and better generalisation while reducing computational overhead, making it particularly well-suited for modern training scenarios. 

Another example is Adaptive Gradient Clipping (AGC), a technique designed to stabilise gradient updates and prevent training failures on noisy datasets. This is especially advantageous when dealing with multilingual and sparse Indian corpora, where data quality and consistency can vary significantly. 

The bottom line is clear: training effective LLMs does not require vast supercomputing farms. Instead, leveraging cutting-edge, continually improving optimisation techniques that tackle computational complexity head-on can produce accurate and efficient language models with modest computing resources. 

Myth 2: ‘Bigger Models Are Always Better’ 

The idea that ‘bigger is better’ has been deeply ingrained in our understanding of artificial intelligence (AI). In reality, however, smarter design, cleaner data and focused optimisation can consistently outperform the brute size-oriented approach.  

We highlight some specific techniques as illustrative examples towards our claim. 

Structured pruning, decluttering for efficiency: Structured pruning can be likened to organising a cluttered house—removing what isn’t necessary to keep only what adds value. In machine learning, this means systematically eliminating redundant components like attention heads, neurons or even entire layers while maintaining the model’s performance.  

Recent advancements, such as those by Xu et al. (2024), have shown that structured pruning can achieve significant reductions in size and computational cost without degrading accuracy. 

A leaner, pruned LLM, trained on clean, high-quality datasets in Indian languages like Tamil, Marathi or Assamese, can easily outperform bloated, generic models trained on noisy global data. With structured pruning, smaller models also run faster on edge devices, enabling deployment in rural or underserved areas. 

Quantized Low-Rank Adaptation (Q-LoRA): Introduced by Zhang et al. (2024), it is a breakthrough for fine-tuning large pre-trained models with minimal computational resources. Instead of updating all parameters, Q-LoRA focuses only on low-rank components of the model while simultaneously quantizing them to lower precision, for example 4-bit.  

This dramatically reduces memory usage and speeds up training, while retaining near-perfect accuracy. Fine-tuning becomes feasible even on resource-constrained hardware, such as modest graphics processing units (GPUs) or mobile devices.  

For India, this innovation means task-specified models—like AI assistants for farmers, educators or healthcare workers—can be fine-tuned quickly, affordably and efficiently. 

Knowledge distillation: Teaching smaller models to be smarter knowledge distillation is like mentoring—where a smaller ‘student’ model learns from the behaviour of a large, pre-trained ‘teacher’ model. Instead of training the student from scratch, the teacher’s outputs serve as guidance, enabling the student to replicate its performance while being orders of magnitude smaller. 

Techniques like progressive distillation and task-aware distillation further enhance this process, ensuring smaller models remain highly optimised for their specific use-cases. 

Distilled models consume less memory, process inputs faster and require significantly lower inference costs. This makes them ideal for real-world deployment in power constrained environments, such as mobile phones in rural India. 

Myth 3: India Does Not Need Its Own LLMs 

The myth is not only misguided, but it is also perhaps the most harmful. India’s diversity, with its 22 official languages, thousands of dialects and unique socio-economic challenges, demands AI that understands and adapts to these nuances.  

Imported AI—as in the foundational models LLM development naysayers portend India must stay away from developing, will always fall short towards comprehensively addressing challenges and unique opportunities unique to India. We discuss a few of these opportunities. 

Breaking language barriers is one of the most profound opportunities. India’s linguistic diversity is both a strength and a challenge. Imagine an AI model that seamlessly translates between Kannada and Hindi or provides critical weather updates and farming advice in Telugu or Bengali. 

Farmers in remote villages could access life-saving information in their native tongues, enabling them to make informed decisions. Regional LLMs are not just desirable; they are necessary to bridge India’s communication divide. Unlike generic global models that often interpret local expressions and dialects, India-specific models, curated on high-quality datasets, can capture these subtle nuances with precision. 

Revolutionising health-care is another critical area where India-specific LLMs can make a life-altering impact. Health care in India faces unique challenges: under-resourced hospitals, a vast rural population and the rising burden of chronic diseases.  

Here, AI models inspired by LLM architectures can analyse longitudinal patient data—predicting disease progression in conditions like Alzheimer’s, diabetes and cardiovascular diseases. Such models would allow early intervention and personalised treatment plans, even in resource-limited settings. 

Technologies like federated learning make this feasible without compromising patient privacy, enabling hospitals across the country to collaboratively train AI models on local data while keeping sensitive records secure. 

Mobile-friendly AI for rural India represents the final—and perhaps most transformative—frontier. For much of rural India, the smartphone is the gateway to the digital world. Innovations like TinyML and lightweight Transformers are redefining the boundaries of efficiency, allowing sophisticated AI models to run directly on mobile devices. These tools can deliver AI-powered assistance to farmers, educators and healthcare workers in real-time—without relying on expensive servers or constant connectivity. 

Whether it is a chatbot offering health advice in Assamese or an AI tutor explaining math concepts in Tamil, mobile-friendly models will democratise AI, making it accessible to every Indian, regardless of location or resources. 

Building India-specific LLMs isn’t just a technological endeavour—it’s about empowerment. It’s about breaking linguistic barriers, transforming healthcare, and ensuring that no one is left behind in the AI revolution. By creating models tailored to India’s unique needs, we can deliver solutions that are not only smarter but also more inclusive, accessible and impactful. 

LLM Efforts in India 

India’s journey toward developing indigenous LLMs is well underway. Several organisations and research groups are leading the charge, building models that are efficient, inclusive and tailored to India’s diverse linguistic and socio-economic needs. These initiatives highlight India’s capability to drive innovation on the global AI stage. 

India’s universities are central to the country’s AI future, leveraging talent and research to drive innovation in efficient, localised LLMs.  

India’s academic institutions—across IITs, IISc, IIITs and other leading research centres—are uniquely positioned to lead the next wave of AI advancements. With their strong foundation of expertise, committed faculty and talented students, these institutions can drive progress in areas critical to AI innovation, such as optimisation, model architectures, deep learning, energy-efficient computing, natural language processing and information retrieval. 

Advances in algorithmic research are essential to making AI models more efficient, scalable and sustainable. Indian institutions can focus on developing new approaches that improve model performance while reducing computational costs.  

Emphasis on energy-aware and low-cost solutions will ensure AI remains accessible and deployable in resource-constrained environments, such as rural areas and mobile devices. By fostering innovation across multiple domains, academia can play a central role in building AI systems that are both cutting-edge and practical. 

The creation of high-quality, curated datasets tailored to India’s linguistic cultural and sector-specific needs will be crucial. Academic institutions can work collaboratively to address challenges in low-resource languages and domains where data remains scarce  

Generating annotated corpora, exploring synthetic data generation techniques and building domain specific datasets in areas such as health-care, education and agriculture will ensure AI solutions are inclusive and relevant to India’s diverse population. 

India’s academic community can amplify its global impact by actively contributing to open-source AI initiatives. Sharing datasets, models and research breakthroughs with global platforms will position Indian research as a driving force in AI innovation. 

Establishing national-level research networks where institutions collaborate on shared models, infrastructure and benchmarks will further accelerate progress. Open-source tools and frameworks tailored to Indian needs can democratise AI, enabling businesses, start-ups and developers to build on a robust foundation of locally developed solutions. 

Establishing academia-industry partnerships will help build India’s innovation pipeline. In the United States, the success of AI ecosystems in hubs like Silicon Valley and New York has been largely fueled by strong partnerships between academia and the tech industry. 

Universities such as Stanford, MIT and NYU serve as innovation engines, where groundbreaking research flows into the industry through collaborative projects, direct funding and technology incubators.  

Companies like Google, Microsoft and OpenAI routinely partner with top academic labs, ensuring that research in areas such as model optimisation, energy-efficient AI and data systems transition seamlessly into real-world applications.  

India can adopt and adapt this model by recognising the mutual value in such partnerships. Universities—across IITs, IISc, and IIITs—have immense untapped potential to serve as India’s own innovation hubs, provided stronger links with large technology companies are established.  

These partnerships should go beyond just hiring graduates or funding spin-off startups. Instead, India needs structured collaboration models where:  

  • Tech companies co-fund research labs within universities to tackle domain-specific challenges such as AI for agriculture, health care or multilingual NLP. 

  • Companies provide computational infrastructure (cloud credits, GPUs, AI accelerators) to remove barriers faced by academic researchers working on large-scale AI. 

  • Joint initiatives focus on productisation of academic research, enabling smoother transitions from experimental models to deployable, real-world solutions. 

This approach must, however, remain attuned to India’s realities. While Indian institutions may not yet have the end-to-end scale of a Stanford or MIT, the groundwork is strong—India boasts one of the largest pools of AI talent and vibrant industry ecosystems in Bengaluru, Hyderabad, and Pune. A model where tech giants, government-backed AI initiatives, and academic researchers collaborate can unleash innovation that aligns with India’s unique challenges.  

By nurturing a robust academic-industry pipeline, India can bridge the gap between research and deployment, fostering a cycle where technology companies benefit from cutting-edge discoveries while academia gains access to the resources needed to lead global AI breakthroughs. 

The Road Ahead 

The world is moving from bigger to smarter AI. The focus has shifted to efficient architecture, thoughtful optimisation and domain-specific applications. India does not need to compete with tech giants on size. It needs to innovate, using its unique strengths—world-class talent, diverse data and unparalleled linguistic and cultural richness.  

By building its own LLMs, India can democratise AI by making tools accessible to rural and underserved communities, drive health-care innovation by predicting diseases and improving outcomes and ensure data sovereignty by retaining control over sensitive national data. 

The future of AI belongs to those who innovate smarter, not bigger. India has everything it takes to lead this revolution. India should by all means build its own language models. 

Naveen Ashish is guest faculty at the department of computer science, Berkeley University. Jaijit Bhattacharya is president, Centre for Digital Economy Policy Research 

Published At:

Advertisement

Advertisement

Advertisement

Advertisement

×