It takes a lot to make a splash in the world of science when people are excited about the discovery of vaccines against SARS-CoV-2. The success of Artificial Intelligence (AI) in reliable and consistent predictions of structures of proteins with great accuracy is one such splash. The AI-based program, AlphaFold2, from the company DeepMind, has stunned the world by accurately and quickly predicting the structure of proteins, starting from the sequences of amino acids that constitute them.
Proteins are ubiquitous in all organisms. By comparing and analysing protein structures, it is possible to get ideas about biological evolution, diseases, defence mechanisms, etc. This explains the human quest for finding the structures of proteins. In 1972, Christian B. Anfinsen won the Nobel Prize in Chemistry for his experiments that showed that a protein could fold into its structure based on the information contained in the sequence of amino acids. Since then, scientists around the world have been trying to computationally predict protein structures.
Only about 60 years ago, Max Perutz and others experimentally determined the first protein structures of myoglobin and haemoglobin. They did this through a method called X-ray crystallography that uses protein crystals and X-rays. Knowing the structure of haemoglobin helped people understand how it is able to perform its function of transporting oxygen from the lungs to the cells in the body. It also showed how changing a single amino acid can cause sickle cell anaemia. Just as knowing the shape of the human nose or the crow’s beak helps understand its function, knowing a protein structure helps recognise how it functions and how a defect may lead to malfunctioning.
Proteins, along with nucleic acid sequences that make a genome, form the basis of all organisms. Technology has advanced so much that it is routine and inexpensive to sequence genomes. The sequences of amino acids that form the proteins are encoded in genes which are part of the genome. Therefore, translating and getting the sequences of proteins is easy. But getting the three-dimensional structures of proteins was so far possible only experimentally through time-consuming and expensive techniques of X-ray crystallography, nuclear magnetic resonance and cryo-electron microscopy.
India has had a legacy of being a top player in the field of protein structural work, both experimental and computational. The Ramachandran Plot devised nearly 60 years ago by G.N. Ramachandran and others from the University of Madras is used even today the world over to validate protein structures.
In 1994, John Moult and his colleagues started an exercise, to bring fun and rigour into structure prediction, called Critical Assessment of Protein Structure Prediction (CASP). Since then, this has been conducted every two years. In this, scientists who have experimentally determined protein structures voluntarily do not submit the structure to the public database but make available the protein sequence for a structure prediction challenge. That way, the predicted structures could be compared to the experimentally determined ones without any bias in prediction. CASP divided the targets into categories based on the difficulty level. The effort also resulted in devising a quantitative measure called Global Distance Test (GDT) which would be 100 when the predicted and the experimental matched perfectly and zero if there was no match at all.
In the beginning rounds, the median scores across all categories were around 20 with slow progress being seen across the years. In 2018, a new player called AlphaFold from DeepMind joined the game breathing ‘deep learning’ AI techniques into the prediction algorithm. It did well with a modest score of around 60 across categories. With the subsequent publication of their method, this year saw a large number of the participants including the deep learning procedures in their algorithms at different stages. But DeepMind’s AlphaFold team led by Demis Hassabis and John Jumper changed tack and switched to ‘attention-based’ deep learning which has been successful in image and speech recognition. This method used in AlphaFold2 is analogous to how people tend to solve jigsaw puzzles paying attention to the pieces that fit locally while keeping the whole picture in mind.
It was a remarkable success. The median score was at the level of around 90 — something never reached before. This, according to John Moult, meant that the problem of structure prediction for compact single module proteins is essentially solved. Once the details of the method are spelt out in a publication, more scientists will take this up to predict even structures of protein complexes that form molecular machines in cells.
S. Krishnaswamy, a structural biologist and protein crystallographer, is visiting professor at The Institute of Mathematical Sciences, Chennai