It looks like China’s tech industry is making impressive strides, especially in the realm of artificial intelligence, with a focus on maximizing the potential of their existing hardware. DeepSeek, a forward-thinking company, is spearheading this movement by developing smart software solutions to boost the performance of NVIDIA’s “cut-down” Hopper H800 GPUs. Their latest venture is set to transform the landscape of AI accelerators significantly.
DeepSeek’s newest innovation, FlashMLA, aims to unlock the full potential of these GPUs by optimizing how they handle memory and allocate resources. This approach could mark a major advancement for China’s AI sector, allowing them to extract unprecedented power even from limited hardware.
During the first day of their “OpenSource” week, DeepSeek unveiled FlashMLA—a decoding kernel crafted specifically for the Hopper GPUs. This release has undoubtedly made a significant splash in the market, showcasing innovations that are nothing short of groundbreaking.
According to DeepSeek, FlashMLA can achieve an astounding 580 TFLOPS for BF16 matrix multiplication using the Hopper H800, which is a remarkable leap—about eight times the usual industry standard. Additionally, FlashMLA makes the most out of memory utilization, boasting a memory bandwidth that reaches up to 3000 GB/s, effectively doubling the theoretical peak of the H800. What’s particularly noteworthy is that these outcomes were achieved through clever coding techniques rather than physical hardware upgrades.
DeepSeek’s FlashMLA employs a method known as “low-rank key-value compression.” In simpler terms, this technique condenses sections of data for quick processing and reduces memory usage by 40% to 60%. Moreover, it adapts a dynamic block-based paging system, allocating memory based on task demands rather than a fixed amount. This flexibility allows models to efficiently handle sequences of varying lengths, significantly improving overall performance.
What DeepSeek has demonstrated with FlashMLA is that AI computing doesn’t rely solely on hardware. It’s a multifaceted field, proving that clever software innovations can yield tremendous improvements. Currently, FlashMLA is geared specifically for Hopper GPUs, but there’s a lot of anticipation about what it could achieve when applied to the H100. The coming developments in this space are sure to be eagerly followed by those interested in cutting-edge AI technology.