India’s AI development faces challenges, calling MeitY’s attention


A balanced approach to data policy and regulation that prioritises AI innovation while protecting personal data is crucial for advancing India’s AI ecosystem through open data initiatives

As the dust settles on general elections and the allocation of Union ministries, attention will now turn to the upcoming regulations and policies stuck in the pipeline over the past year.

Notifying rules for the Digital Personal Data Protection Act (DPDP Act 2023), framing guidelines for artificial intelligence (AI), amendments to the IT Act, and drafting the framework under the Digital India initiative will be top priorities for the Ministry of Electronics and IT (MeitY).Even after MeitY’s successful run to establish consensus on digital public infrastructure (DPI) during India’s G20 presidency, comprehensive policies and regulations to support digitalisation in India have fallen behind.

Digitalisation has enabled granular data collection across sectors via high-capacity computing and network effects.However, concerns arise over a few big tech players monopolising these datasets.In an AI-driven economy, digital platforms with significant proprietary data hold competitive advantages, aided by their resources for large-scale data procurement.Conversely, startups struggle to amass and refine data for AI applications.

Similarly, government agencies have started to establish data standards and publish datasets through the Open Government Data (OGD) Platform India — providing access to open data from 165 government departments across 33 sectors.However, the OGD platform is compromised by issues of quality, disparate schema, and metadata standardisation, and a lack of high-value data, making it costly for AI developers to prepare and engineer this data.

Even the penetration of 330 million unique users through UPI-enabled digital payments, has unlocked the potential for large-scale financial data collection in India.This is being developed further through the Account Aggregator (AA) framework — a consent-based data-sharing mechanism for the financial sector.Sesame, India’s first large language model (LLM) specifically designed for the BFSI sector was recently unveiled, but it would require deeper UPI penetration and more users to consent to financial data sharing through the AA framework to effectively …

In this regard, creating avenues for open and accessible public data can be a game-changer for developing India’s AI ecosystem. With the rapid adoption of applications such as ChatGPT, developers building AI for India are now seeking data representation, i.e., ensuring India’s diverse population is accurately represented within the LLMs being developed for Indian use cases.

Considerable progress has been made in developing the open data ecosystem in India.To address language barriers and bias in foreign-developed AI models, ‘AI4Bharat’ at IIT Madras, Sarvam AI’s ‘OpenHathi series’, and MeitY’s ‘BhashaDaan initiative’ focus on developing open-source datasets, tools, models, and applications for Indian languages. Zomato’s ‘Weather Union’ and ‘Zomato Food Trends, and Namma Yatri’s real-time dashboard display private sector intent to contribute to open data.

At the state level, Tamil Nadu and Punjab are developing state-specific data-sharing policies, while Odisha and Karnataka are promoting open data through policies. Within Bengaluru, public transport agencies unveiled plans to open real-time transit data for startups to leverage in building mobility-as-a-service applications.Nevertheless, a concerted effort is required at the national level to harness the potential of open data for responsible AI development in India.

For starters, the government needs to convene states, industry, academia, and other partners across the public data ecosystem to draft technical guidelines, quality standards, and curation methods for publishing AI-ready open data for a wide range of use cases.Setting up the National Data Management Office (NDMO) under the jurisdiction of MeitY as proposed in the draft Digital India Bill is a step in the right direction, but ecosystem-wide consultation needs to be emphasised to ensure there is multi-stakeholdership in AI development.

To promote the availability of good quality data, the EU’s metadata quality dashboard to help data providers evaluate their metadata against various indicators such as accessibility, interoperability, and reusability is a good benchmark to aim for.The increasing penetration of DPI in India can also provide a means through which these indicators can be implemented, with technology architecture advisories like the Center for DPI (CDPI) providing core knowledge on crafting open datasets — advocating for a federated design to avoid centralising data and using open standards, and APIs for interoperability.Foundational data infrastructure can be designed using these principles to enable stakeholders to contribute effectively to open data initiatives.

However, these measures would be ineffective without creating the necessary amendments to the DPDP Act 2023 and the IT Rules 2020, to counter consumer-related concerns arising from blatant data scraping by Big Tech.These need to be prioritised to ensure the safety of a user’s personal data online, and designed to provide a user with the agency to use digital platforms without being subjected to erroneous data collection practices, and copyright breaches.

Finally, under the India AI programme, the government is also working on creating a platform for datasets in a public-private partnership model, to house the largest collection of anonymised data.However, it is suggested that restrictions will be imposed on sharing these datasets only with companies the government deems trustworthy, to limit misinformation, deepfakes, and AI bias.This is a highly ineffective method to prevent AI-related harm as it is difficult to conduct large-scale compliance of AI applications, and it severely limits the potential of open datasets to be used fairly.Instead, setting up independent fact-checking organisations that can immediately report and flag misinformation or deepfakes on platforms would be a better route to take — minimising government intervention, and not adding roadblocks to AI innovation.