How Perplexity optimized 1T parameter AI models for AWS EFA • The Register

AI search provider Perplexity's research wing has developed a new set of software optimizations that allows for trillion parameter or large models to run efficiently across older, cheaper hardware using a variety of existing network technologies, including Amazon's proprietary Elastic Fabric Adapter.

These innovations, detailed in a paper published this week and released on GitHub for further scrutiny, present a novel approach to addressing one of the biggest challenges in serving large-scale mixture of experts models (MoE) at scale: memory and network latency.

Mo parameters, mo problems

MoE models, like DeepSeek V3 and R1 or Moonshot AI's Kimi K2, are big, ranging from 671 billion to 1 trillion parameters. This means they're too large to run on eight-GPU systems using older H100 or H20

See Full Page

138