The new ‘AI scientist’ combines theory and data to find scientific equations
(Nanowerk News) In 1918, the American chemist Irving Langmuir published a paper examining the behavior of gas molecules attached to a solid surface. Guided by careful experimental results, as well as his theory that solids offer separate places for gas molecules to fill, he devised a series of equations that explained how much gas would stick together, given the pressure.
Now, some one hundred years later, a “scientist AI” developed by researchers at IBM Research, Samsung AI, and the University of Maryland, Baltimore County (UMBC) has reproduced a significant part of the work of Nobel Prize winner Langmuir. The system—the artificial intelligence (AI) that serves as the scientist—also reinvented Kepler’s third law of planetary motion, which can calculate the time it takes one celestial object to orbit another object at the distance separating them, and produce good Einstein’s relativistic estimates of time. . -the law of dilation, which shows that time slows down for fast moving objects.
A paper explaining the results will be published in the journal Nature Communications (“Combining Data and Theory for Derivative Scientific Discovery with AI-Descartes”).
Reasonable machine learning tools
The new AI scientist — dubbed “AI-Descartes” by researchers — joins the likes of AI Feynman and other recently developed computing tools aimed at accelerating scientific discovery. At the heart of this system is a concept called symbolic regression, which finds equations to fit data. Given basic operators, such as addition, multiplication, and division, the system can generate hundreds to millions of candidate equations, looking for the one that most accurately describes the relationships in the data.
AI-Descartes offers several advantages over other systems, but its most distinctive feature is its ability to reason logically, says Cristina Cornelio, a research scientist at Samsung AI in Cambridge, UK who is the first author on the paper. If there are multiple candidate equations that fit the data well, the system will identify which equation best fits the background scientific theory. The ability to reason also differentiates the system from “generative AI” programs like ChatGPT, whose large language models have limited logical skills and sometimes mess up basic math.
“In our work, we combine a first principles approach, which has been used by scientists for centuries to derive new formulas from existing background theory, with a data-driven approach that is more common in the machine learning era,” said Cornelio. “This combination allows us to leverage both approaches and create more accurate and meaningful models for a wide range of applications.”
The name AI-Descartes refers to the 17th century mathematician and philosopher René Descartes, who argued that nature could be explained by some basic physical laws and that logical deduction played a key role in scientific discovery.
Suitable for real world data
The system works especially well with noisy real-world data, which can mess up traditional symbolic regression programs that might ignore the true signal in an attempt to find a formula that catches every zig and zag of wrong data. It also handles small data sets well, even finding equations reliably when given as few as ten data points.
One factor that may have slowed the adoption of tools like AI-Descartes for frontier science is the need to identify and code associated background theory for open scientific questions. The team worked to create a new dataset containing real measurement data and associated background theory to refine their system and test it in new terrain.
They also want to train computers to read scientific papers and build their own background theory.
“In this work, we needed a human expert to write down, in formal computer-readable terms, what the axioms of the background theory were, and if a human missed one or made a mistake, the system would not work,” said co-author Tyler Josephson, assistant professor of Chemical, Biochemical, and Environmental Engineering at UMBC. “In the future,” he says, “we also want to automate this part of the work, so we can explore more areas of science and engineering.”
This goal motivated Josephson’s research on AI tools to advance chemical engineering.
Ultimately, the team hopes their AI-Descartes, like real humans, can inspire productive new approaches to science. “One of the most exciting aspects of our work is the potential to make significant advances in scientific research,” said Cornelio.