Bryan Catanzaro’s full keynote at Nebius Academy AI DNA, Amsterdam
“We have to make AI more energy-efficient. We're going to invent every technology we can to do that.”
NVIDIA’s Vice President Bryan Catanzaro on open models and research priorities.

Bryan Catanzaro’s enormous influence at NVIDIA predates his current title. Long before he became Vice President, his work convinced NVIDIA’s leadership that deep learning was not a curiosity but the company’s entire future.
A 2013 paper he co-authored with Stanford researchers demonstrated that three GPU servers could replace thousands of CPU cores for training DL models. The neural network library he had been developing around that time evolved into cuDNN, a foundational deep learning toolkit that proved critical to the modern AI stack. These and many other contributions cemented his role as one of the architects of the AI era.
Now, as VP of Applied Deep Learning Research, Bryan is the driving force behind Nemotron, NVIDIA’s family of open models. They are, Catanzaro says, “the most important way for our research to impact the ecosystem, ” while also directly shaping the hardware that runs them.
That co-evolution of models and infrastructure is now central to NVIDIA’s AI strategy, and to Catanzaro’s own work, he said in an interview at the Nebius Academy event in Amsterdam, sharing his vision on topics from low-precision arithmetic and energy efficiency to what it means for AI to cross into general intelligence.
You described open models as the potential energy for AI in future applications, the same way that an open internet unlocked a great many innovations. Where in that story are we right now? Still in the dial-up era?
I think there’s no consensus in the AI field about how AI is going to be developed and deployed. There is a vision of AI as infrastructure, like the internet, that needs to be integrated into every company’s business in a different way, in the same way that healthcare uses the internet in a different way than retail. And there are differing opinions about what that would mean.
I think it’s really essential to have an open ecosystem that allows for different enterprises with different goals to innovate in ways that they need for their own personal understanding of their customers and the problems they’re trying to solve. That’s why I think AI is like infrastructure. Now, I think that is perhaps not an agreed-upon view in the community, which is why we see a lot of really excellent AI shops trying to build AI services rather than AI infrastructure.
I think AI services are great, too. There are lots of reasons why companies would want to use a service. It’s much quicker to get started, and the services can be quite excellent, and there’s nothing wrong with that. I just believe that in the same way that the internet needed to be open in order to unlock the trillions of dollars of value that it did, that AI also needs to be accessible in an open way so that companies aren’t held back by some other service, trying to figure out how to build something that solves their needs.
So this is a perspective, a philosophy that I have, and it’s one of the reasons I work at NVIDIA. But I wouldn’t say that’s something everybody in the field agrees with.
You said that Nemotron’s first job is to ensure NVIDIA continues to exist by informing future hardware. Can you give examples of hardware decisions that were influenced by the Nemotron program?
Two things have been really important recently. One is the way that mixture-of-experts model architectures have impacted our networking capabilities. With the Blackwell generation, we really leaned into NVLink switches at a much larger domain. We have 72 GPUs on a single domain that can all read and write each other’s memory with random access at very high speed.
We knew that capability was going to be important for mixture-of-experts models because of the work that we had been doing on Nemotron. A lot of the great results that people are having with these systems are traced back to our bet that models in the future are going to need much better networking.
Another thing has to do with reduced precision arithmetic. We’ve been continuously adding support for smaller and smaller numbers in order to save energy, power, and memory, and with Blackwell, we’ve really invested a lot in 4-bit representations. We have this NVFP4 representation, and figuring out how to use that both for deployment as well as for training has been one of our core projects at Nemotron. For example, our Nemotron 3 Super and Ultra models were pre-trained using 4-bit arithmetic, and it’s kind of crazy that it is even possible.
Which workloads are benefiting most from low-precision formats today and what have been the biggest surprises or lessons learned so far?
The biggest success for NVFP4 has been in deployment during inference. I think there have been a lot of organisations, including our Nemotron teams, that have found you can actually deploy using NVFP4 at really similar accuracies to a BFLOAT16 checkpoint of the same model. And of course, the speedups can be significant. So that’s been the biggest application so far.
I think the Nemotron project is more committed to using FP4 formats for pre-training than other projects, and one of the reasons for that is that we believe there’s an opportunity here and we know where GPU technology is headed. We know how important it is to use more energy-efficient computation in AI, and so we’ve found a lot of success using NVFP4 for pre-training.
I don’t think a lot of other organisations have picked that up yet, but hopefully over time they will. One of the surprises is that we’ve had to introduce a lot of new techniques to make that happen. We use quantisation-aware distillation to take our higher-precision checkpoints and convert them into FP4 checkpoints that can be deployed with very high accuracy.
So there have been new algorithms that we’ve had to invent. For pre-training, the numerics have gotten rather sophisticated. We’re using things like randomised Hadamard transforms to remove spikes in the activations of the neural network that would ordinarily cause numerical instabilities in model training. Inventing that technology hasn’t been straightforward.
It’s pretty clear that one of the most important levers that we have to increase is to make AI more energy-efficient. Of course, we’re going to invent every technology we can to do that.
Sparse GPU acceleration has been in NVIDIA’s hardware since Ampere. You’ve said it’s not being used nearly enough. What kind of breakthrough would sparsity need for the industry to embrace it more broadly?
Sparsity has been a hard one for us to crack. We did include hardware acceleration for sparse neural networks in Ampere, as you said, and it’s been out there for a while, and it hasn’t been adopted as widely as I would like. Why did we do it? Well, you know, we have this fundamental belief that a lot of the computation in neural networks should be sparse.
You can find all sorts of analogies in neuroscience about the way the brain works that kind of indicate a high degree of sparsity. It also makes sense from a computer science angle. We know a lot of important computations can be better expressed through sparse computation.
It’s been difficult to show that they provide a meaningful speedup at acceptable accuracy. What I’m trying to do with our research is flip that around and show how you can achieve superior accuracy using sparse neural networks, because those efficiency gains should actually translate into intelligence gains.
We haven’t done that yet, and so that’s an assignment for the Nemotron project going forward. It’s one of the dimensions of research that we think is important to continue working on.
How far can synthetic data take us before it starts recycling the same reasoning patterns? Is there a ceiling?
You know, we have gone into synthetic data generation pretty hard for Nemotron, and we spend an enormous amount of our compute building it. We’re looking for interesting data, finding the needle in the haystack. That could be the seed for a new synthetic dataset that could really expand the intelligence and generalisation capabilities of our model.
I don’t think we’ve reached the ceiling here. We started with things like math (Nemotron Mind) because they’re much easier to prove. It’s much easier to verify that your synthetic data is actually intelligent.
What would make you say that this is different now, that the field has crossed from very capable systems into something closer to general intelligence?
You know, Jensen has actually said that he thinks AGI is already here, and he’s been speaking about that publicly. And I think the reason he’s saying this is that we’re seeing people use AI in new ways and find dramatic productivity improvements they didn’t have even three months ago.
Like in Nemotron development, for example, our researchers and software engineers are using AI very deeply every day, and we’re finding pretty extraordinary benefits from everything from kernel generation to data set generation.
I really like that definition of AGI, where AI has become general enough to be useful for basically everybody trying to build something. I like that definition because I think it’s practical. It’s human-centered in a way that doesn’t oversimplify intelligence into some unidimensional metric, like IQ or SAT scores. One of the problems I’ve always had with AGI as a concept is that intelligence is so multidimensional.
For example, if you were founding a company and wanted to pick a co-founder based only on SAT scores or whether someone was an International Mathematical Olympiad gold medallist, that obviously wouldn’t lead to great results, right? Everybody knows intelligence isn’t that simple.
And yet sometimes I feel the discussion around AGI becomes very oversimplified. So I think it’s important to keep the human part centered when we talk about AGI. If AI has become a tool that almost every human can use to be better at their job, then AGI is here. And I think that’s what we’re starting to see.

You built some of the original deep learning libraries, watched AlexNet change the world, and now you’re designing open models. What research result in the last six months genuinely surprised you? Something that you didn’t see coming.
Probably the thing I’ve been most excited about over the past few months has been the speed at which AI is beginning to solve increasingly difficult problems. For example, we study the probability of task completion as a function of the amount of human effort the task would typically require. And we are seeing that AI models from so many different organisations are really pushing that curve forward. That has happened faster than I expected.
And it’s so interesting because there has been a constant narrative for the past 15 years that AI is at a dead end, that it’s plateaued, that these systems are just stochastic parrots repeating things from their training data. And I’ve never believed that was true. As somebody working in the field, I’ve always been surprised at how well these models are generalising and actually learning how to solve problems. And I feel like the progress has been accelerating faster than I thought it would.



