Indian Start-Up Ziroh Labs Cuts AI Inferencing Costs by 50% Using CPUs

Inferencing is the process of a trained AI model making predictions or solving tasks using new or unseen data

Updated on: 27 April 2025 9:05 am

An AI processor developed by an Indian start-up and IIT Madras has been able to cut inferencing costs by nearly 50% compared to global benchmarks by running large language models on CPUs instead of expensive GPUs. This CPU-based AI system is known as ‘Kompact’.

This AI platform, launched by Bengaluru-based deeptech start-up Ziroh Labs and IIT Madras, can tackle India’s compute accessibility gap by eliminating the country’s reliance on costly GPUs (graphic processing units). It enables foundational models with less than 50 billion parameters to be inferences using CPUs. It also outperforms existing CPU-based systems by nearly three times.

“The inferencing cost is reduced by almost 50% and the benchmarked results will be out shortly. Currently the solution is not optimised for training,” Prof. Madhusudhanan B, Principal Consultant – IITM Pravartak Technologies Foundation told Outlook Business.

Follow our WhatsApp channel

Inferencing is the process of a trained AI model making predictions or solving tasks using new or unseen data. According to Madhusudhanan, this process is important because it’s the phase where 99.99% of users interact with the AI model after it has been trained.

However, a major challenge with inferencing today is the GPU shortage, which is typically required to run these large models efficiently. And a person needs to wait for six months to get hands on a good GPU.

Currently, the average cost of artificial intelligence computing in India is Rs 115.85 per GPU hour, compared to the global market rate of $2.5-$3 (approximately Rs 259) per GPU hour. Union Minister Ashwini Vaishnaw had earlier mentioned that the bid cost for high-end AI computing in the IndiaAI mission is approximately Rs 150/GPU hour. But the government promised to provide a 40% compute subsidy to the students, startups and researchers on the GPU usage cost.

AI Phones Will Enter Indian Market in Two Years, Says Qualcomm

BY Outlook Business Desk

Rs per GPU hour refers to the cost, in Indian Rupees (Rs), of using a Graphics Processing Unit (GPU) for one hour. It is a standard pricing metric used in cloud computing and AI/ML workloads, where companies or individuals rent GPUs from cloud providers like AWS, Google Cloud, Azure, or other cloud providers.

Amid this AI GPU race, Ziroh Labs has optimised 17 AI models, including DeepSeek, Llama, Qwen, BERT, Phi, etc., to operate them efficiently on CPUs. Currently, it is working on various other models like Gemma, Moonshine, GoogleNet, OpenPose, ShuffleNet, etc.

It opens up use cases in finance, law, healthcare, manufacturing — anywhere data privacy is critical. With Kompact AI, start-ups don’t need to send data to the cloud or rely on GPU-heavy setups.

Published At: 27 April 2025 9:05 am