Do you want help to prepare for the bar examination, plan a birthday party, or even translate Ukrainian to Punjabi? A single artificial intelligence (AI) model can do it all. A U.S. company, OpenAI, has once again sent shock waves around the world, this time with GPT-4, its latest AI model. This large language model can understand and produce language that is creative and meaningful, and will power an advanced version of the company’s sensational chatbot, ChatGPT. Currently, GPT-4 is available to try by premium subscription or by getting on OpenAI’s waitlist.
GPT-4 and what it can do
GPT-4 is a remarkable improvement over its predecessor, GPT-3.5, which first powered ChatGPT. GPT-4 is more conversational and creative. Its biggest innovation is that it can accept text and image input simultaneously, and consider both while drafting a reply. For example, if given an image of ingredients and asked the question, “What can we make from these?”GPT-4 gives a list of dish suggestions and recipes. The model can purportedly understand human emotions, such as humorous pictures. Its ability to describe images is already benefiting the visually impaired.
While GPT-3.5 could not deal with large prompts well, GPT-4 can take into context up to 25,000 words, an improvement of more than 8x. GPT-4 was tested in several tests that were designed for humans and performed much better than average. For instance, in a simulated bar examination, it had the 90th percentile, whereas its predecessor scored in the bottom 10%. GPT-4 also sailed through advanced courses in environmental science, statistics, art history, biology, and economics.
However, GPT-4 failed to do well in advanced English language and literature, scoring 40% in both. Nevertheless, its performance in language comprehension surpasses other high-performing language models, in English and 25 other languages, including Punjabi, Marathi, Bengali, Urdu and Telugu. ChatGPT-generated text infiltrated school essays and college assignments almost instantly after its release; its prowess now threatens examination systems as well.
OpenAI has released preliminary data to show that GPT-4 can do a lot of white-collar work, especially programming and writing jobs, while leaving manufacturing or scientific jobs relatively untouched. Wider use of language models will have effects on economies and public policy.
The advent of GPT-4 upgrades the question from what it can do, to what it augurs. Microsoft Research (Microsoft has invested in OpenAI) mentioned observing “sparks” of artificial general intelligence — a system that excels at several task types and can comprehend and combine concepts such as writing code to create a painting or expressing a mathematical proof in the form of a Shakespearean play — in GPT-4. If we define intelligence as “a very general mental capability that, among other things, involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly, and learn from experience”, GPT-4 already succeeds at four out of these seven criteria. It is yet to master planning and learning.
GPT-4 is still prone to a lot of its flaws its predecessor have. Its output may not always be factually correct — a trait OpenAI has called “hallucination”. While much better at cognising facts than GPT-3.5, it may still introduce fictitious information subtly. Ironically, OpenAI has not been transparent about the inner workings of GPT-4. The GPT-4 technical report clearly states: “Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.”
While secrecy for safety sounds a plausible reason, OpenAI is able to subvert critical scrutiny of its model. GPT-4 has been trained on data scraped from the Internet that contains several harmful biases and stereotypes. There is also an assumption that a large dataset is also a diverse dataset and faithfully representative of the world at large.
This is not the case for the Internet, where people from economically developed countries, of young ages and with male voices are overrepresented. OpenAI’s policy to fix these biases thus far has been to create another model to moderate the responses, since it finds curating the training set to be infeasible. Potential holes in this approach include the possibility that the moderator model is trained to detect only the biases we are aware of, and mostly in the English language. This model may be ignorant of stereotypes prevalent in non-western cultures, such as those rooted in caste.
Just asking GPT-4 to pretend to be “AntiGPT” causes it to ignore its moderation rules, as shown by its makers, thus jailbreaking it. As such, there is vast potential for GPT-4 to be misused as a propaganda and disinformation engine.
OpenAI has said that it has worked extensively to make it safer to use, such as refusing to print results that are obviously objectionable, but whether these efforts will keep GPT-4 from becoming a student at ‘WhatsApp university’ remains to be seen. The larger question here is about where the decision to not do the wrong thing should be born: in the machine’s rules or in the human’s mind.
A ‘stochastic parrot’
In essence, GPT-4 is a machine that predicts the next word in an unfinished sentence, based on probabilities it learned as it trained on large corpuses of text. This is why linguistics professor Emily Bender called GPT-4 a “stochastic parrot”, speaking in comprehensible phrases without understanding the meaning. But Microsoft Research has maintained that GPT-4 does understand what it is saying, and that not all intelligence is a type of next-word prediction.
Professor Bender and her peers highlighted the harm of large language models two years ago, citing both ethical concerns and the environmental costs. They also specified an opportunity cost imposed by a race for bigger models trained on larger datasets, distracting from smarter approaches that look for meaning and train on curated datasets. Their warnings have gone unheeded. Apart from OpenAI’s models, AI company Anthropic has introduced a ChatGPT competitor named Claude. Google recently announced PaLM, a model trained to work with more degrees of freedom than GPT-3.
More broadly, efforts are underway worldwide to build a model with a trillion degrees of freedom. These will be truly colossal language-models that elicit questions about what they cannot do, but these concerns would be red herrings that distract us from whether we should be building models that simply test the limits of what is possible to the exclusion of society’s concerns.
Jitesh Seth is a data scientist at DeepTek, researching the effectiveness of AI in radiology. Viraj Kulkarni is chief data scientist at DeepTek