
Artificial intelligence systems predict the consequences of gene modification
SAN FRANCISCO, CA—May 31, 2023—Researchers at Gladstone Institutes, the Broad Institute of MIT and Harvard, and the Dana-Farber Cancer Institute have turned to artificial intelligence (AI) to help them understand how the vast interconnected network of human genes controls function. cells, and how disturbances in those tissues lead to disease.
SAN FRANCISCO, CA—May 31, 2023—Researchers at Gladstone Institutes, the Broad Institute of MIT and Harvard, and the Dana-Farber Cancer Institute have turned to artificial intelligence (AI) to help them understand how the vast interconnected network of human genes controls function. cells, and how disturbances in those tissues lead to disease.
A large language model, also known as a foundational model, is an AI system that learns underlying knowledge from large amounts of common data, then applies that knowledge to solve new tasks—a process called transfer learning. This system recently gained major attention with the release of ChatGPT, a chatbot modeled after OpenAI.
In a new work, published in the journal Natural, Gladstone Assistant Investigator Christina Theodoris, MD, PhD, developed a basic model for understanding how genes interact. The new model, called the Geneformer, learns from large amounts of data about gene interactions from various human tissues and transfers this knowledge to make predictions about how things can go wrong in disease.
Theodoris and his team used the Geneformer to explain how heart cells go awry with heart disease. This method, however, can also treat many other types of cells and diseases.
“Geneformers have broad applications in many areas of biology, including finding possible drug targets for disease,” said Theodoris, who is also an assistant professor in the Department of Pediatrics at UC San Francisco. “This approach will greatly enhance our ability to design tissue-correcting therapies in diseases where progress is constrained by limited data.”
Theodoris designed the Geneformer during a postdoctoral fellowship with X. Shirley Liu, PhD, former director of the Center for Functional Cancer Epigenetics at the Dana-Farber Cancer Institute, and Patrick Ellinor, MD, PhD, director of the Cardiovascular Disease Initiative at the Broad Institute—both authors of the new study.
Network View
Many genes, when active, initiate cascades of molecular activity that trigger other genes to increase or decrease their activity. Some of those genes, in turn, influence other genes — or reverse and brake the first. So when a scientist sketches the relationships between several dozen related genes, the resulting network map often looks like a tangled spider’s web.
If mapping only a handful of genes in this way is a mess, trying to understand the relationships between all 20,000 genes in the human genome is a formidable challenge. But a map of such a large network will give researchers insight into how the entire gene network changes with disease, and how to reverse those changes.
“If drugs target genes that are peripheral in tissues, it may have little impact on how cells function or only manage disease symptoms,” said Theodoris. “But by restoring normal levels of genes that play a central role in tissues, you can treat the underlying disease process and have a much greater impact.”
Artificial Intelligence “Transfer Learning”
Typically, to map gene networks, researchers rely on large datasets that include many similar cells. They use a subset of AI systems, called machine learning platforms, to construct patterns in the data. For example, machine learning algorithms can be trained on large samples from patients with and without heart disease, and then study the patterns of gene networks that distinguish diseased samples from healthy ones.
However, standard machine learning models in biology are trained to complete only one task. In order for models to accomplish different tasks, they must be retrained from scratch on new data. So if researchers from the first sample now wanted to identify diseased kidney, lung, or brain cells from their healthy counterparts, they would need to start over and train new algorithms with data from those networks.
The problem is, for some diseases, there isn’t enough data to train these machine learning models.
In the new study, Theodoris, Ellinor and their colleagues tackled this problem by leveraging a machine learning technique called “transfer learning” to train a Geneformer as a foundational model whose core knowledge can be transferred to new tasks.
First, they “trained” the Geneformer to have a fundamental understanding of how genes interact by feeding it data on gene activity levels in about 30 million cells from various human tissues.
To show that the transfer learning approach works, scientists then refined the Geneformer to make predictions about relationships between genes, or whether reducing levels of certain genes will cause disease. Geneformer is able to make these predictions with much higher accuracy than alternative approaches because of the underlying knowledge it acquires during the pretraining process.
In addition, Geneformer is able to make accurate predictions even when it only displays a small number of relevant data examples.
“This means the Geneformer can be applied to making predictions in diseases where research progress has been slow because we don’t have access to large enough data sets, such as rare diseases and diseases affecting tissue that are difficult to sample in the clinic,” said Theodoris.
Lessons for Heart Disease
Theodoris’ team then began using transfer learning to advance heart disease discovery. They first asked Geneformer to predict which genes would have a detrimental effect on the development of cardiomyocytes, the muscle cells in the heart.
Among the top genes identified by the model, many have been linked to heart disease.
“The fact that the model predicts genes that we already know are critical for heart disease gives us the added confidence that it is capable of making accurate predictions,” said Theodoris.
However, other potentially important genes identified by Geneformer have not been linked to heart disease, such as the TEAD4 gene. And when the researchers removed TEAD4 from cardiomyocytes in the lab, the cells were no longer able to beat as vigorously as healthy cells.
Therefore, Geneformer has used transfer learning to draw new conclusions: despite not yet being informed about TEAD4-deficient cells, Geneformer correctly predicted the critical role that TEAD4 plays in cardiomyocyte function.
Finally, the group asked Geneformer to predict which genes to target to make diseased cardiomyocytes resemble healthy cells at the gene network level. When the researchers tested the two proposed targets in cells affected by cardiomyopathy (a disease of the heart muscle), they actually found that removing the predicted gene using CRISPR gene editing technology restored the overpowering ability of diseased cardiomyocytes.
“In learning what normal gene networks look like and what diseased gene networks look like, Geneformers can tell what features to target to switch between healthy and diseased states,” says Theodoris. “The transfer learning approach allowed us to overcome the challenge of limited patient data to efficiently identify proteins that are likely to be drug-targeted in diseased cells.”
“The benefit of using a Geneformer is the ability to predict which genes can help change cells between healthy and diseased states,” said Ellinor. “We were able to validate this prediction in cardiomyocytes in our laboratory at the Broad Institute.”
The researchers plan to expand the number and types of cells the Geneformer has analyzed to continue to improve its ability to analyze gene networks. They also make the model open-source so that other scientists can use it.
“With the standard approach, you have to retrain the model from scratch for every new application,” says Theodoris. “The really exciting thing about our approach is that the Geneformer’s fundamental knowledge about gene networks is now transferable to answer a lot of biological questions, and we’re looking forward to seeing what others do with it.”
###
About Study
The paper “Transfer learning enables predictions in tissue biology” is published in the journal Natural on May 31, 2023.
Other authors are Ling Xiao, Mark Chaffin, Zeina Al Sayed, Matthew Hill and Helene Mantineo of the Broad Institute; Anant Chopra and Elizabeth Brydon of Bayer US LLC; and Zexian Zeng of the Dana-Farber Cancer Institute.
This work was supported by grants from the National Institutes of Health (1RO1HL092577, 1R01HL157635, 5R01HL139731, T32GM007748), the American Heart Association (18SFRN34110082, 20CDA35260081), the European Union (MAESTRIA 965286), and the Helen Hayney Foundation Postdoctoral Fellowship.
About the Gladstone Institute
Gladstone Institutes is an independent, non-profit life science research organization that uses visionary science and technology to tackle disease. Founded in 1979, it is located at the epicenter of biomedical and technological innovation, in the Mission Bay neighborhood of San Francisco. Gladstone has created a research model that disrupts the way science works, funds great ideas and attracts the brightest minds.
About the Broad Institute of MIT and Harvard
The Broad Institute of MIT and Harvard launched in 2004 to empower this generation of creative scientists to transform medicine. The Broad Institute seeks to describe the molecular components of life and their relationships; discover the molecular basis of major human diseases; developing new effective approaches to diagnostics and therapy; and disseminate findings, tools, methods, and data openly throughout the scientific community.
Founded by MIT, Harvard-affiliated hospitals, and Los Angeles visionary philanthropists Eli and Edythe L. Broad, the Broad Institute includes faculty, professional staff, and students from across the MIT and Harvard biomedical research communities and beyond, with collaborations that span more than one hundred private and public institutions in more than 40 countries around the world.
About the Dana-Farber Cancer Institute
Dana-Farber Cancer Institute is one of the world’s leading centers for cancer research and treatment. Dana-Farber’s mission is to reduce the burden of cancer through scientific investigation, clinical care, education, community engagement and advocacy. Dana-Farber is a federally designated Comprehensive Cancer Center and teaching affiliate of Harvard Medical School.
We provide the latest cancer treatments to adults through the Dana-Farber Brigham Cancer Center and to children through the Dana-Farber/Boston Children’s Cancer and Blood Disorders Center. Dana-Farber is the only national hospital with a top 5 US News & World Reports Ranking of the Best Cancer Hospitals for adult and pediatric care.
As a global leader in oncology, Dana-Farber is dedicated to a unique and equal balance between cancer research and care, translating findings into new treatments for patients locally and worldwide, offering more than 1,100 clinical trials.
DOI
10.1038/s41586-023-06139-9
Article title
Transfer learning enables predictions in tissue biology
Article Publication Date
31-May-2023