NVIDIA has truly raised the bar with its latest GeForce RTX 5090, which outpaces AMD’s RX 7900 XTX in terms of inference performance on the DeepSeek R1. This impressive pace is largely thanks to its cutting-edge fifth-generation Tensor Cores.
Let’s dive into how accessible NVIDIA has made DeepSeek’s Reasoning Models with its new RTX GPUs. Both NVIDIA and AMD have been hard at work creating optimal environments for running high-end language learning models (LLMs) on local machines. Recently, AMD put the spotlight on its RDNA 3 flagship GPU’s capabilities with the DeepSeek R1 model. However, NVIDIA, often referred to as Team Green, quickly followed up by revealing their new RTX Blackwell GPUs’ inference benchmarks. The results clearly highlight the GeForce RTX 5090’s superior performance.
A detailed performance chart shows the GeForce RTX 5090 consistently outperforming the Radeon RX 7900 XTX, as well as the older generation models, across various DeepSeek R1 models. When it comes to running Distill Qwen 7b and Distill Llama 8b, the RTX 5090 processes up to 200 tokens per second, effectively doubling the capabilities of AMD’s RX 7900 XTX. This demonstrates NVIDIA GPUs’ undeniable prowess in AI applications and indicates we’ll be seeing more cutting-edge AI solutions on consumer PCs, especially with enhanced “RTX on AI” support.
For those keen to explore DeepSeek R1 with NVIDIA’s RTX GPUs, the company offers a step-by-step guide in a recently published blog. The process is surprisingly straightforward, akin to using any web-based chatbot. Here’s the simple rundown:
NVIDIA has released the DeepSeek-R1 model as an NVIDIA NIM microservice preview on build.nvidia.com, designed to help developers securely experiment and create specialized agents. Astonishingly, it can process up to 3,872 tokens per second on a single NVIDIA HGX H200 system. This microservice simplifies implementations by supporting industry-standard APIs, allowing enterprises to ensure top-notch security and data privacy by running it on their preferred, accelerated computing infrastructure.
Thanks to NVIDIA’s NIM, developers and enthusiasts can experiment with the AI model directly on their local setups. Not only does this enhance data security, but it also leads to improved performance, provided the hardware is up to the task.