Softbank launched check outcomes from their experiment to show high-performance integration of GPUs and RAN over the previous month. I recognize their transparency, making the info public. Listed here are my preliminary observations:
- The efficiency of their cluster is spectacular. They crowded 20 4T4R radios right into a check space roughly 100m x 400m, and reached 1.48 Gbps peak complete throughput in 100 MHz of spectrum. That’s virtually 15 bps/Hz.
- Softbank studies that in the course of the check, the centralized server (2 Grace Hopper GH200 ‘superchips’) consumed 500W of DC energy on common. That’s exceptional and surprisingly low. Good job by NVIDIA, shutting down cores and different assets not used for this RAN workload.
- The flexibility to deal with peak throughput was good. The check concerned streaming HD video to 100 smartphones concurrently. When the streaming periods began, all smartphones have been buffering on the similar time, leading to excessive peak throughput demand. Afterward, the buffers have been full, and the throughput settled right down to a a lot decrease degree. Softbank didn’t report on the DC energy consumption in the course of the peak.
Okay, so the check was a good way to indicate excessive efficiency in a RAN cluster. The following step is to find out “what does it imply?”. I’ve a number of ideas right here:
To begin with, Softbank set this check up as a centralized RAN configuration with one central DU server and 20 distant RUs. This allowed Softbank to make the most of Distributed MIMO processing, which has been identified to extend spectral effectivity by 2-3x in earlier merchandise and trials. The D-MIMO facet of efficiency is a very powerful purpose for the excessive spectral effectivity, not the usage of GPUs. I’ve seen different exams with spectral effectivity within the vary of 12-20 bps/Hz utilizing ASICs as an alternative of GPUs.
If we take away the influence of D-MIMO from the spectral effectivity equation, the efficiency right here is much like 5G large MIMO networks, with peak throughput round 5-8 bps/Hz. So my first conclusion is that GPUs can efficiently carry out on the similar degree of capability as ASIC options or CPUs.
Second, I might level out that the facility consumption reported by Softbank is the common energy consumption, not the height energy required throughout a interval of peak throughput. I think that there’s some sleight-0f-hand right here, as the 2 GH200 ‘superchips’ can eat as much as 2kW of energy. That’s nonetheless not too dangerous for a dense cluster of 20 radios, at solely 100W per RU for the DU processing.
Supporting increased peak DC energy draw signifies that the operators would wish greater AD/DC converters, and probably massive air conditioners and even liquid cooling.
Third, Softbank presents the low common DC energy as proof that this cluster is “value efficient”. That could be true on the OPEX aspect, with the surprisingly excessive power effectivity at low throughput. However I’ve severe doubts on the CAPEX aspect. If I evaluate the price of two GH200s to twenty ASIC-based DUs, the GPUs nonetheless look costly at greater than triple the fee.
On the finish of the day, I conclude that this check is a superb strategy to show excessive efficiency for a excessive density cluster. For a soccer stadium or airport, the centralized GH200 and D-MIMO may very well be a good selection. In that sort of crowded atmosphere, operating fiber for top numbers of radios could be attainable and cost-effective, and the ultra-high density of D-MIMO drives the centralization anyway.
However, I don’t suppose that the GH200 strategy passes the ‘low value’ check for a widespread cell community, particularly in locations the place fiber is tougher and costly to deploy. There are additionally business-model and operational challenges with telcos providing AI companies. I’ve printed some detailed ideas on this subject right here.
For the broad market, I consider that we’ll see integrations of GPU cores with ASICs. No operator desires to pay for 2 GH200s (144 GPU cores and all the reminiscence and different help that comes together with them). Licensing the GPU expertise and dropping one or two cores into an ASIC could be far cheaper. We are able to get lots of the advantages of AI inferences, environment friendly RAN processing, and excessive capability with a extra surgical software of GPUs the place they’re wanted.
