Generative modeling allows computers that have been trained to recognize patterns and statistical probabilities in vast datasets to create images, audio, and videos. But generating this realistic-looking digital content requires a significant amount of computational power and resources, prompting researchers to explore avenues for efficiency without compromising quality.
A team that includes Johns Hopkins mathematician Holden Lee has developed a theoretical analysis aimed at making a form of generative modeling called diffusion modeling faster—and less resource-heavy. The researchers presented their results at the NeurIPS conference in December.
“Diffusion models require immense computational resources, so practitioners use various tricks to speed them up. However, there’s limited theoretical understanding of why they work. We gave the first proof that a class of such methods can result in substantial acceleration, requiring a sublinear number of steps for generation,” said Lee, an assistant professor in the Department of Applied Mathematics and Statistics at the Whiting School of Engineering.
The study compared two mathematical equations used in generative modeling: They found that using ODEs for some computations in generative models could offer faster computational capabilities.
“We wanted to unravel the mystery of why replacing SDEs with ODEs in these models improved efficiency. This was somewhat surprising because SDEs, which incorporate randomness, are a more natural choice for generating a data distribution, whereas ODEs are deterministic. So why did they do better than SDEs, which are designed to handle random noise?” Lee asked. “We aimed to develop a theory that explains this observed phenomenon.”
The researchers tackled this challenge by analyzing the number of steps required for the computer program to execute its task. They looked at the dependence on two important aspects: dimensionality (how many different parts or components the program needed to handle) and smoothness (how predictable those parts are).
The team used a technique called a “coupling argument” to compare the accuracy of their computer simulation, which was ODE-based, to an ideal process free from numerical or statistical error. However, in contrast to the SDE simulation, the error from the ODE simulation was harder to control. They found a way to fix any mistakes or discrepancies that popped up during the simulation, using a “corrector” comprising a Markov chain Monte Carlo algorithm to make sure the simulation stayed close to the right track.
“Incorporating the corrector step is crucial for stability. For a high-quality generation, it’s advisable to utilize the corrector alongside the ODE,” said Lee.
They found that by using this method, the computer program could do its job much more efficiently. The ODE-based algorithms speed up the process significantly, requiring fewer steps compared to the SDE-based approaches. However, this acceleration had a cost: the operation became more sensitive to unpredictable or “non-smooth” elements in the data, sometimes increasing the number of steps needed and making errors harder to control.
“As generative AI becomes increasingly ubiquitous, optimizing computational resources becomes imperative. By providing theoretical guarantees and practical guidance, our study lays a foundation for more efficient diffusion model implementations,” Lee explains.