New AI-based software enables fast and reliable protein imaging
The more the better
“TomoTwin paves the way for the automatic identification and direct localization of proteins in their cellular environment, expanding the potential of cryo-ET,” said Gavin Rice, first co-author of the publication. Cryo-ET has the potential to decipher how biomolecules work in cells and, by doing so, unravel the basis of life and the origins of disease.
In cryo-ET experiments, scientists use a transmission electron microscope to obtain 3D images, called tomograms, of cellular volumes containing complex biomolecules. To get a more detailed image of each different protein, they averaged as many copies as possible – akin to a photographer taking the same photo at multiple exposures and then combining them in a perfectly exposed image. Most importantly, one must correctly identify and locate the different proteins in the images before calculating the average. “Scientists can achieve hundreds of tomograms per day, but we lack the tools to fully identify the molecules in them,” said Rice.
So far, the researchers have used algorithms based on familiar molecular structure templates to look for matches in tomograms, but they tend to be error-prone. Identifying molecules by hand is another option that ensures high-quality capture but takes days to weeks per data set.
Another possibility is using a form of supervised machine learning. These tools can be highly accurate but are currently lacking in usability, as they require manually labeling thousands of samples to train the software for each new protein, a nearly impossible task for small biological molecules in a dense cellular environment.
The newly developed TomoTwin software overcomes many of these obstacles: It learns to select molecules that are similar in shape on the tomogram and maps them to a geometric space – a system that is rewarded for placing similar proteins near one another and punished otherwise. In the new map, researchers can isolate and accurately identify different proteins and use them to locate them in cells. “One of the strengths of TomoTwin is that we provide pre-trained capture models,” says Rice. By removing the training step, the software can even run on a local computer – whereas processing tomograms typically takes 60-90 minutes, runtime on the Raven MPI supercomputer is reduced to 15 minutes per tomogram.
TomoTwin enables researchers to select dozens of tomograms in the time it takes to manually select one, increasing data throughput and averaging speed for better imagery. Current software can find globular proteins or protein complexes larger than 150 kilodaltons in cells; in the future, Raunser’s group aims to include membrane proteins, filamentous proteins, and smaller sized proteins.
TomoTwin: general 3D macromolecule localization in cryo-electron tomograms by structural data mining
Article Publication Date