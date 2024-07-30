Policymakers have been grappling with the rising complexity of Machine Learning (ML) models that churn huge swathes of data through Large Language Models (LLMs) and deep neural networks. The complexity has made it difficult for data fiduciaries to effectively “correct, complete, update and erase” sensitive data from computer systems. Simultaneously, we are witnessing an increase in AI (Artificial Intelligence) bias, misinformation, and breach of privacy, which gets heightened during events such as elections.

The antithesis of ML

In order to deal with this problem, a possible solution that has ignited interest among researchers and companies alike is the idea of Machine Unlearning (MUL). First mooted by Cao and Yang in ‘Towards Making Systems Forget with Machine Unlearning’, MUL ponders upon the question of how we can make machines forget data from trained AI models. It is the antithesis of ML. An algorithm is added to the AI model for the purpose of identifying and deleting false, incorrect, discriminatory, outdated, and sensitive information.

The concept builds on the challenge of removing information due to the constant churning of data by these LLMs. So much so that it gets difficult to keep track of the data as it can be utilised for multiple objectives, creating a complex web of algorithms, also known as data lineage, that adversely affect its quality, leading to manipulation, adversarial outputs, and difficulty in locating and removing sensitive information. Moreover, as there is no sandbox approach for choosing and processing data in these models, there is also a proven possibility of hackers inserting manipulated data to produce biased results (data poisoning).

One might argue for simply deleting the entire data set, i.e. data pruning, and re-training the entire AI model. However, it will lead to inflated computational costs and undue delays for the data fiduciaries while simultaneously carrying the risk of losing substantial accuracy. Consequently, MUL is gaining traction as a viable option among data fiduciaries such as IBM where the models are being tested for enhanced unlearning accuracy, intelligibility, reduced unlearning time and cost efficiency.

Three approaches

The question, however, remains how a MUL model can be implemented to effectively fulfil the obligation. There could be three approaches based on their viability for on-ground implementation: private, public, and international. In the private approach, data fiduciaries will be primarily responsible for testing MUL algorithms, which can then be applied across their training models for efficient deletion based on specific requirements. This voluntary approach gives companies much headroom to enhance their AI models and preserve users’ rights without undue government intervention. However, the problem occurs in expertise and affordability to execute these models, which might discourage smaller companies from testing the solutions. This is the model currently being followed, albeit at a preliminary stage.

In the public approach, the government has the responsibility to prepare the statutory blueprint, either through soft-law or hard-law approaches, to obligate data fiduciaries to fulfil their legal obligations. This approach has to be read with the context of rising mentions of AI in legislative proceedings (from 1,247 in 2022 to 2,175 in 2023) across 49 jurisdictions. The data reflect a high possibility of government intervention in the near future if a major breakthrough in a MUL model parallels the rising regulatory landscape. The government can issue guidelines under the respective Data or AI Protection Regime mandating that data fiduciaries implement a plausible MUL model. For instance, the European Union’s AI Act has adopted a soft-law approach by adding a provision to tackle data poisoning. It considers data poisoning as a form of cyber attack and directs data fiduciaries to put security controls “to ensure a level of cybersecurity appropriate to the risks.”

On the contrary, the government can itself prepare a MUL model as part of its Digital Public Infrastructure for the perusal of data fiduciaries to implement across platforms uniformly. This is especially useful in developing countries where the state has substantive stakes in the DPI for the country’s overall development. Moreover, it addresses the problem of affordability and expertise for smaller companies.

The international approach emphasises the role of nation states in coming together and preparing a framework to be adopted uniformly at a domestic level. The rationale flows from the idea that any innovation in AI has trans-boundary implications, and it is preferable to follow uniform standards across jurisdictions as a step ahead towards global governance of AI. As the efficacy of this approach is not clear amid geopolitical frictions, the onus effectively shifts to the role of international standard-setting organisations such as the International Electrotechnical Commission to come up with MUL standards that can be applied across jurisdictions.

These approaches represent a formal blueprint for one of the solutions that can be utilised to subdue the menace of Generative AI and preserve the user’s right to be forgotten. The MUL is still in the preliminary stages. Therefore, stakeholders must address technical and regulatory considerations to ensure its effective implementation in this evolving landscape of AI.

