The AI question

LLM cannot give data not already present on the Web; so it is just a sophisticated search engine

September 24, 2023 02:47 am | Updated 02:47 am IST

AI generation tools, just as every other tool, are created by collecting data — a large quantity of language data. 

AI generation tools, just as every other tool, are created by collecting data — a large quantity of language data.  | Photo Credit: Getty Images/iStockphoto

The year 2023 has witnessed increasingly polarised debates on artificial intelligence’s benefits and harms. The worldwide use of AI tools has become the metric for gauging the progress of the human race or its extinction.

AI cuts down employment opportunities for the huge workforce across the globe. This makes it seem like a threat to society because of its ability to influence human decisions, perform activities with calculated precision, and generate answers in a matter of seconds. However, AI has sparked tremendous arguments, and much has been written on the subject, including whether using AI tools such as ChatGPT and Bard can compromise academic integrity and ethics. This has turned out to be a rather grey zone. As artificial intelligence becomes a rather permanent resident in our houses, I find it important to know how they are constructed and what we can do to utilise them without letting AI subdue our individual, human voices.

AI generation tools, just as every other tool, are created by collecting data — a large quantity of language data. This is known as a corpus, and it is an important constituent in the study of both linguistics and computer science. Generating a computer language that would closely mimic natural language has been a long-desired objective of the AI development community, which has been facilitated to a great extent by natural language processing (NLP).

Keeping aside the complexities of the process, it is beneficial to learn that the better functioning of such tools depends on the amount of data they are trained on. This establishes a directly proportional relationship between the amount of data the model is fed and how well it will function in generating the most appropriate results, which is why ChatGPT cannot give us data that is not already present on the World Wide Web.

I like to think of these tools as a much more advanced and sophisticated search engine where you can edit the output data to your requirements. Therefore, I think that rather than accepting or dismissing AI, it is important to learn how to optimise the use of the tools. A complete dependence on them or an outright rejection would deprive us of the opportunity to learn, and in the current scenario of rapidly evolving data and information, we would not want to do that.

Top News Today

Sign in to unlock member-only benefits!
  • Access 10 free stories every month
  • Save stories to read later
  • Access to comment on every story
  • Sign-up/manage your newsletter subscriptions with a single click
  • Get notified by email for early access to discounts & offers on our products
Sign in


Comments have to be in English, and in full sentences. They cannot be abusive or personal. Please abide by our community guidelines for posting your comments.

We have migrated to a new commenting platform. If you are already a registered user of The Hindu and logged in, you may continue to engage with our articles. If you do not have an account please register and login to post comments. Users can access their older comments by logging into their accounts on Vuukle.