Vitality-Environment friendly NPU Expertise Cuts AI Energy Use by 44%

July 13, 2025

39

Researchers on the Korea Superior Institute of Science and Expertise (KAIST) have developed energy-efficient NPU expertise that demonstrates substantial efficiency enhancements in laboratory testing.

Their specialised AI chip ran AI fashions 60% sooner whereas utilizing 44% much less electrical energy than the graphics playing cards presently powering most AI programs, primarily based on outcomes from managed experiments.

To place it merely, the analysis, led by Professor Jongse Park from KAIST’s Faculty of Computing in collaboration with HyperAccel Inc., addresses probably the most urgent challenges in fashionable AI infrastructure: the big power and {hardware} necessities of large-scale generative AI fashions.

Present programs reminiscent of OpenAI’s ChatGPT-4 and Google’s Gemini 2.5 demand not solely excessive reminiscence bandwidth but additionally substantial reminiscence capability, driving firms like Microsoft and Google to buy a whole bunch of 1000’s of NVIDIA GPUs.

The reminiscence bottleneck problem

The core innovation lies within the staff’s method to fixing reminiscence bottleneck points that plague current AI infrastructure. Their energy-efficient NPU expertise focuses on “light-weight” the inference course of whereas minimising accuracy loss—a important steadiness that has confirmed difficult for earlier options.

PhD pupil Minsu Kim and Dr Seongmin Hong from HyperAccel Inc., serving as co-first authors, offered their findings on the 2025 Worldwide Symposium on Laptop Structure (ISCA 2025) in Tokyo. The analysis paper, titled “Oaken: Quick and Environment friendly LLM Serving with On-line-Offline Hybrid KV Cache Quantization,” particulars their complete method to the issue.

The expertise centres on KV cache quantisation, which the researchers determine as accounting for most reminiscence utilization in generative AI programs. By optimising this element, the staff permits the identical stage of AI infrastructure efficiency utilizing fewer NPU gadgets in comparison with conventional GPU-based programs.

Technical innovation and structure

The KAIST staff’s energy-efficient NPU expertise employs a three-pronged quantisation algorithm: threshold-based online-offline hybrid quantisation, group-shift quantisation, and fused dense-and-sparse encoding. This method permits the system to combine with current reminiscence interfaces with out requiring adjustments to operational logic in present NPU architectures.

The {hardware} structure incorporates page-level reminiscence administration strategies for environment friendly utilisation of restricted reminiscence bandwidth and capability. Moreover, the staff launched new encoding strategies particularly optimised for quantised KV cache, addressing the distinctive necessities of their method.

“This analysis, by means of joint work with HyperAccel Inc., discovered an answer in generative AI inference light-weighting algorithms and succeeded in creating a core NPU expertise that may remedy the reminiscence drawback,” Professor Park defined.

“By way of this expertise, we carried out an NPU with over 60% improved efficiency in comparison with the most recent GPUs by combining quantisation strategies that cut back reminiscence necessities whereas sustaining inference accuracy.”

Sustainability implications

The environmental influence of AI infrastructure has turn out to be a rising concern as generative AI adoption accelerates. The energy-efficient NPU expertise developed by KAIST provides a possible path towards extra sustainable AI operations.

With 44% decrease energy consumption in comparison with present GPU options, widespread adoption may considerably cut back the carbon footprint of AI cloud providers. Nevertheless, the expertise’s real-world influence will depend upon a number of elements, together with manufacturing scalability, cost-effectiveness, and business adoption charges.

The researchers acknowledge that their answer represents a big step ahead, however widespread implementation would require continued improvement and business collaboration.

Trade context and future outlook

The timing of this energy-efficient NPU expertise breakthrough is especially related as AI firms face growing strain to steadiness efficiency with sustainability. The present GPU-dominated market has created provide chain constraints and elevated prices, making different options more and more enticing.

Professor Park famous that the expertise “has demonstrated the potential for implementing high-performance, low-power infrastructure specialised for generative AI, and is anticipated to play a key function not solely in AI cloud information centres but additionally within the AI transformation (AX) atmosphere represented by dynamic, executable AI reminiscent of agentic AI.”

The analysis represents a big step towards extra sustainable AI infrastructure, however its final influence might be decided by how successfully it may be scaled and deployed in business environments. Because the AI business continues to grapple with power consumption considerations, improvements like KAIST’s energy-efficient NPU expertise supply hope for a extra sustainable future in synthetic intelligence computing.

(Photograph by Korea Superior Institute of Science and Expertise)

See additionally: The 6 practices that guarantee extra sustainable information centre operations

Need to study extra about cybersecurity and the cloud from business leaders? Try Cyber Safety & Cloud Expo happening in Amsterdam, California, and London.

Discover different upcoming enterprise expertise occasions and webinars powered by TechForge right here.

Previous articleWhy (and the way) you will need to degree up drone present animations

Next articleHow AI and Sensible Platforms Enhance Electronic mail Advertising and marketing

Vitality-Environment friendly NPU Expertise Cuts AI Energy Use by 44%

The reminiscence bottleneck problem

Technical innovation and structure

Sustainability implications

Trade context and future outlook

Related Articles

Robots-Weblog | Kosmos Gecko-Bot Testbericht

Robotic Discuss Episode 156 – Rugged robots for harmful missions, with Gavin Kenneally

Physicists Have Measured ‘Destructive Time’ within the Lab

LEAVE A REPLY Cancel reply

Latest Articles

Robots-Weblog | Kosmos Gecko-Bot Testbericht

Robotic Discuss Episode 156 – Rugged robots for harmful missions, with Gavin Kenneally

Physicists Have Measured ‘Destructive Time’ within the Lab

Why knowledge high quality beats scale

IEEE Goals to Join These Nonetheless Offine

ABOUT US