AI search provider Perplexity's research wing has developed a new set of software optimizations that allows for trillion parameter or large models to run efficiently across older, cheaper hardware using a variety of existing network technologies, including Amazon's proprietary Elastic Fabric Adapter.
These innovations, detailed in a paper published this week and released on GitHub for further scrutiny, present a novel approach to addressing one of the biggest challenges in serving large-scale mixture of experts models (MoE) at scale: memory and network latency.
Mo parameters, mo problems
MoE models, like DeepSeek V3 and R1 or Moonshot AI's Kimi K2, are big, ranging from 671 billion to 1 trillion parameters. This means they're too large to run on eight-GPU systems using older H100 or H20

The Register

NBC News
Cowboy State Daily
CBS Evening News
Salon
Oh No They Didn't
KETV Politics
Deadline