Advertisement
X

India’s Healthcare AI Start-ups Grapple with a Broken Data Ecosystem

Between fragmented hospital records and regulatory uncertainty under the DPDP Act, India’s healthtech start-ups face a severe data crunch. This in turn is pushing some to synthetic datasets and others to overseas partnerships

AI Healthcare
Summary
  • Lacking real data, start-ups like Axone Health resort to creating synthetic case sheets to train their AI models

  • Others, such as Artelus, gather data through rural eye camps, pairing healthcare delivery with consent-based data collection

  • The scarcity of accessible, reliable health data slows innovation cycles and keeps India’s AI-healthcare sector small

Advertisement

As AI adoption in healthcare accelerates, the biggest roadblock for Indian start-ups in the sector remains access to quality data. Patient records are often unstructured, hard to read, or stuck in outdated systems, making it tough to build reliable AI models. Founders say that they want to make models for India, but they don’t have enough data to train the models. The alternative: models pre-trained on Western datasets are not credible as they underperform on Indian populations.  

“Even existing Electronic Health Records (EHR) systems capture very little, often just admission and discharge details. This is because hospitals hire typists or clerks who only enter partial data. Many still use Microsoft Word instead of proper EHRs,” says Arnab Chatterjee, founder of Axone Health, a start-up building a fully AI-powered EMR (Electronic Medical Record) solution for hospital staff. 

Advertisement

To add to it, India’s DPDP Act, 2023, which lays down rules for how personal data, can be collected, has made hospitals “sceptical” of sharing datasets, add experts. The Act mandates explicit patient consent when it comes to storing data. Now, consent must be specific, clear, and unambiguous, explaining what you will do with the data, how you will use and process it, and if you will share it with third parties.  

Founders argue that the real hurdle is ambiguity. While the DPDP Act has been passed, its provisions will only apply once the government formally notifies their enforcement. Until then, start-ups are left in a haze of uncertainty, preparing for compliance without knowing when specific sections will actually come into force.  

“The Data Protection Board, which will serve as the regulator, is yet to be set up. Draft rules were expected and should be finalized soon, but until then, the Act remains in a sort of limbo,” says Ashima Obhan, Senior Partner at Obhan & Associates. 

Advertisement

Further, the penalties under the Act are steep with fines going up to ₹250 crores for violations. Crucially, they are not linked to turnover, meaning a small start-up and a large corporation face the same maximum cap. Generally, hospitals are wary about sharing data.

Even if personal identifiers such as names or dates of birth are redacted, hospitals worry the information could still be traced back to the original source.

“Hospitals worry that anonymized data can still be traced back to individual patients, especially in rare cases. For example, if a patient with a rare blood group in a small region has cancer, it isn’t difficult to identify them from the dataset,” says Suvrankar Datta, Group Lead, Centre for Responsible Autonomous Systems in Healthcare (CRASH Lab) and Faculty Fellow, Koita Centre for Digital Health, Ashoka University. Also, hospitals are not sure that shared data will not be misused.    

Advertisement

Experts say misuse is already happening: “There are data aggregators and ‘consultants’ who try to sell datasets, and outreach happens via Telegram and even LinkedIn. Sometimes individual doctors or diagnostic centre owners with large archives approach start-ups, proposing pay-per-scan models (₹100–₹200 per scan, for tens of thousands of scans),” a founder notes. 

As a result, start-ups say conversations often collapse before they begin as administrators demand stricter compliance and exhaustive approvals. Now, with DPDP in place, hospitals raise several concerns such as are you sure that the data will not be used somewhere else? What is the guarantee that it will not be leaked? 

Before DPDP, hospitals followed internal rules through ethics committees. Now, every request triggers multiple layers including clinical protocols, committee reviews, anonymization audits, and secure server setups, add experts. 

“Ethics committees meet monthly or every three months, and include a neurologist or radiologist, a lawyer, a medical director, an IT person, and an ethics lead. We present why we selected that hospital, why we want to perform the study, and how we will protect data so anonymized data doesn’t become open source, because the owner of the data is the patient,” says Anuj Chandalia, founder, Manentia AI, a start-up which focuses on AI-driven analysis of medical images, especially chest X-rays and CT scans.    

Advertisement

The deadlock is pushing some start-ups abroad. “I know companies that ended up partnering with US hospitals like Memorial Sloan Kettering because they just couldn’t get access to Indian datasets,” says one industry insider. “So instead of creating our own foundational healthcare models, we’re depending on imported ones.”   

Synthetic data   

In the absence of real data, many start-ups are forced to improvise. Chatterjee of Axon Health recalls how his team and a group of doctors spent days dictating hundreds of mock case sheets into their system. The 250 synthetic cases became the base for fine-tuning their model. “It was expensive. We spent $40 worth of API credits in just four days, but it gave us a starting point,” he says. 

Others prefer to start from scratch. Artelus, a start-up led by Girish Somvanshi and developing AI systems for early detection of retinal disease, began collecting data through free rural eye camps, in partnership with NGOs and hospitals. Early on, he realised that simply approaching hospitals or institutions for datasets would not work. The scepticism was too high, and the stakes higher still.  So his team embedded itself in rural health camps. “We knew data is sensitive. That’s why we partnered with NGOs and hospitals, signed agreements, and made patient consent central to our process,” Girish recalls. “We were not just collecting information; we were providing free healthcare services alongside it. That’s how goodwill started building.” 

Weeks were spent travelling to remote districts, screening villagers with portable fundus cameras. What at first looked like a logistical hurdle — ensuring trained operators were available in difficult terrain — soon yielded an insight into AI training itself.  “We noticed that operator dependency was hurting our AI,” Girish explains. “Two people could take images of the same eye, and the results would vary so much that our models struggled to learn. That’s when we shifted to fully automatic cameras, cutting out variability and making camps faster and more effective.” 

The shift was costly. Automatic fundus cameras were expensive, and taking AI offline so that it could function in areas without internet demanded fresh engineering. Even so, the results have been impressive. Girish estimates that more than two lakh people have now been screened through such camps, generating over four lakh retinal images and identifying nearly 40,000 patients with abnormalities.  The sector is expanding fast.

According to Grand View Horizon, India’s AI-in-healthcare market is expected to reach $8.7bn by 2030, growing at an annual rate of 41.8% between 2024 and 2030. Yet the industry still grapples with a fundamental constraint: a lack of accessible health data. Tracxn counts just eight AI-driven healthcare companies in India in 2024, up only modestly from two in 2011. That meagre growth underlines the critical role of reliable data in building a viable ecosystem. 

Data is the bedrock for training AI models, developing drugs and creating predictive health solutions. Without structured, widely available datasets, Indian start-ups risk lagging behind global peers.

Innovation cycles are already slower. “For example, if I am doing research on cancer in Rajasthan, I can only collect data when enough patients come to the hospital. This creates a long gestation period, from gathering data to conducting research and finally developing a product. By then, we’ve already missed the bus on innovation,” says Rishabh Nag, co-founder of Humanli.AI.

It is not that the country is ignoring digitisation. The National Health Authority’s database, for instance, already holds a large volume of data under PMJAY. However, much of this comes from scanned documents uploaded for insurance claims. As a result, the data exists but is not truly digital—most of it is in the form of PDFs or images rather than structured, machine-readable records.

Further, experts point out that many government hospitals, doctors often run OPDs late into the evening. Entering data digitally adds to their workload, and records are therefore filled in hastily, with little verification.

Data may be the new oil. Whether Indian health care can secure enough of it to fuel the next generation of models remains uncertain.   

Show comments