An AI created to understand the biological machines — Proteins
On 8 November 2020, Deepmind of Google made a major breakthrough through its Artificial Intelligence program called the AlphaFold 2 by determining a protein's 3D shape from its amino-acid sequence.
Determining the 3D protein structure is biology's oldest challenges to solve. Protein is understood to be an important biomolecule in any living organism (especially humans). It performs a huge number of functions in an organism: catalyzing metabolic reactions, DNA replication, responding to stimuli, providing structure to cells and organisms, and transporting molecules from one location to another. The shape of the protein determines its function in an organism.
In 1969, Cyrus Levinthal estimated that it would take millennia to enumerate all possible configurations of a typical protein by brute force calculation; which he estimated was around 10^300. This dichotomy is sometimes referred to as Levinthal’s paradox.
It is was given to understand that the ability to predict the structure of the protein could unravel unfathomable knowledge of its working, how they work in the organism and could help us decipher the working of diseases especially the coronavirus.
"We have been stuck on this one problem – how do proteins fold up – for nearly 50 years. To see DeepMind produce a solution for this, having worked personally on this problem for so long and after so many stops and starts, wondering if we’d ever get there, is a very special moment." — Professor John Moult
The Structure Problem
The structure of the protein is very unusual. A protein is a linear string of amino acid residue. Every biological organism has 20 different types of amino acids arranged in some order. The structure of the protein also known as protein folding is determined by the interaction of these amino acids in the sequence which is estimated to have more than 200 million plausible combinations. These interactions allow the protein to take a 3-dimensional structure.
There were a number of attempts made to find the protein structure. In the 1950s the first complete structure of proteins was determined using an X-ray beam, called X-ray crystallography. Where the diffracted beam from the protein structure was to use to understand the protein’s atomic coordinates.
Professor Moult started Critical Assessment of protein Structure Prediction, (CASP) to bring more rigor to these efforts. The event challenged teams to predict the structures of proteins, but for which the ground truth has not been made public. Participants must blindly predict the structure of the proteins, and when the ground truth is available these could compare against the results.
Efforts were made but no technique could get the accuracy up to 40-60%. In 2018, AlphaFold achieved an accuracy between 55-60%. With scientists and researchers working along for two years trying different iteration of the AI system they finally managed to secure accuracy of above 92% and their median score being 87.0% which is remarkable.
These exciting results open up the potential for biologists to use computational structure prediction as a core tool in scientific research and not to be skeptical instead. These AI methods have gained the trust of a lot of biologists and researchers and they see it as an opportunity to explore the realm of molecular biology with it. "These methods have proven helpful for important classes of proteins, such as membrane proteins, that are very difficult to crystallize and therefore challenging to experimentally determine."
"This computational work represents a stunning advance on the protein-folding problem, a 50-year-old grand challenge in biology. It has occurred decades before many people in the field would have predicted. It will be exciting to see the many ways in which it will fundamentally change biological research." Venkatraman Ramakrishnan
The training was done on the publicly available data consisting of ~170,000 protein structures from the protein data bank along with databases containing protein sequences of unknown structures. According to Deepmind the algorithm "uses approximately 16 TPUv3s (which is 128 TPUv3 cores or roughly equivalent to ~100-200 GPUs) run over a few weeks, a relatively modest amount of computing in the context of most large state-of-the-art models used in machine learning today".
The architecture of the system allows AlphaFold to extract patterns and representation with precision and speed. The iterative process upon the huge data bank is what gives AlphaFold the ability to master protein folding. Over the few weeks of training, the algorithm can successfully predict the underlying physical structure with (almost) atomic level precision of the protein and is able to determine highly-accurate structures in a matter of days; according to Deepmind.
Overview of the network. Source: Deepmind
With algorithm such as AlphaFold, we could understand how these proteins structure works, and the reason behind their folding can be unraveled.
In another biggest area where AlphaFold can be used is to understand the working of viruses such as the coronavirus. We could possibly see new techniques of creating vaccination with the help of strong AI like this one in a few days maximum.
Lastly, the way proteins interact with DNA, RNA, or small molecules, and to determine the precise location of all amino acid side chains.
There’s so much to learn about the use of these scientific discoveries in the development of new medicines, biological research and understanding, and much more.
High Accuracy Protein Structure Prediction Using Deep Learning: John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Kathryn Tunyasuvunakool, Olaf Ronneberger, Russ Bates, Augustin Žídek, Alex Bridgland, Clemens Meyer, Simon A A Kohl, Anna Potapenko, Andrew J Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman, Martin Steinegger, Michalina Pacholska, David Silver, Oriol Vinyals, Andrew W Senior, Koray Kavukcuoglu, Pushmeet Kohli, Demis Hassabis.