MoSPI is developing a Common Data Platform (CDP) to consolidate official economic statistics from hundreds of government data sources into a single repository.
The government plans to eventually build a dedicated AI language model that will allow users to access official economic data and insights using natural language prompts.
The platform aims to improve data quality, automate integration and enable faster analysis of GDP and other macroeconomic indicators for policymakers, researchers and businesses.
India is developing a unified digital platform to bring together official economic statistics from multiple government departments into a single repository, with plans to eventually build a dedicated artificial intelligence (AI) model trained on the country's official data.
According to a Business Standard report citing people familiar with the matter, the Ministry of Statistics and Programme Implementation (MoSPI) is building a Common Data Platform (CDP) that could later evolve into a specialised large language model (LLM) for official statistics.
One Platform for Government Data
The Common Data Platform aims to eliminate the fragmentation of official data currently spread across ministries, departments and agencies. At present, economic indicators are published in separate PDFs, spreadsheets and databases, requiring users to search multiple sources for information.
The proposed platform will consolidate these datasets into a central repository, enabling policymakers, researchers and businesses to access, compare and analyse official statistics through a single interface.
According to the report, India could become one of the first countries to develop a dedicated LLM specifically for official economic statistics. Most countries, including the UK, Canada, Singapore, Finland and the Netherlands, currently use AI tools that retrieve information from existing databases rather than training standalone statistical language models.
Focus on GDP Data First
The first phase of the project will focus on the National Accounts Division (NAD), which compiles India's GDP, national income, savings and capital formation estimates. The division currently relies on nearly 300 data sources from ministries, regulators, state governments and public-sector organisations.
"We currently release the data sources required for GDP calculations in the form of PDFs and Excel sheets. The aim of this initiative is to integrate these sources into a single platform, making GDP calculations easier," N K Santoshi, Director General (Central Statistics), MoSPI, Business Standard reported, adding that the platform could eventually evolve into an LLM.
Building the AI Foundation
Chennai-based IT firm Bahwan CyberTek (BCT) has been selected to develop the platform. According to the report, the initial 18-month phase will focus on data cleansing, standardisation and building a centralised repository using a modern data lakehouse architecture.
BCT said the platform will integrate data from NAD's 18 business units, automate data management, improve data quality and provide AI-powered analytics and reporting capabilities to support faster policymaking.
If completed as envisioned, the platform could significantly modernise India's statistical ecosystem by making official economic data more accessible while laying the foundation for AI-driven analysis and future data-sharing across government agencies.


























