Nvidia addresses AI peak power demand, spikes in new rack-scale systems

- Nvidia recently announced some of its rack-scale systems will now include a new power supply unit with energy storage and other features the company claims can smooth power spikes and reduce peak grid demand by up to 30%.
- The artificial intelligence powerhouse said in a blog post Monday that the PSU with energy storage, plus the hardware and software needed to use it, will be available with its new GB300 NVL72 platform, as well as GB200 NVL72 systems. A spokesperson for Nvidia declined to say when or provide additional details.
- Santiago Grijalva, a professor of electrical and computer engineering at Georgia Tech, called the new technology “a moderate big deal,” given Nvidia’s role as a dominant player in the space. “But this solution is limited to NVIDIA’s high-end systems,” he wrote in an email. “The solution competes with Tesla’s cooling and Meta’s hardware optimizations, offering a significant but not transformative refinement of existing power management techniques.”
Unlike traditional data centers, data centers that run AI workloads have rapid ramps that the director of reliability services for the Texas Reliability Entity recently compared to the load pattern of a steel mill.
This creates problems for grid operators, utilities and the hyperscalers that need the power.
Nvidia said its new system can help address this challenge with a smoothing solution that consists of several mechanisms across three main operational phases: ramp up, steady state, and ramp down. A power cap feature limits power draw at the start of a workload, with the cap gradually increasing. Once steady-state operation is achieved, storage is deployed to control short-term power fluctuations. For ramping down, the GPU can operate in a special power burner mode to ensure a smooth transition rather than a sharp drop, the company said.
The energy storage mechanism is only used to optimize the load profile seen by the grid and does not provide energy back to the utility, it noted.
“Facilities previously needed to be provisioned for the maximum instantaneous power consumption” they might need, the company said. “Now, with effective energy storage, provisioning can be closer to the target average consumption, enabling more racks within the same power budget or allowing for reduced total power allocation.”
Grijalva agreed that the issue of AI power consumption is a critical one, and energy storage with advanced management helps.
But he said Nvidia’s system does nothing to address the problem of transmission and how to bring massive amounts of power to data centers.
“In addition, the learning models are evolving, which will result in evolving power patterns [and] ... data centers for AI inference will also have different regimes,” he said. “Thus, the solution must be very flexible and future proof.”
utilitydive