During Nvidia's GTC presentation, chairman Jensen Huang revealed the first GPUs of the Ada Lovelace generation (RTX 40).
Apart from specific GPUs, Ada Lovelace has on average 70 percent more CUDA cores on the same surface than Ampere, in a more efficient architecture for the compute clusters. As previously hinted, Nvidia is building the new GPUs on TSMC’s 4N process (5 nanometers).
Because new streaming processors can now gradually reorganize their division of tasks (‘Shader Execution Reordering’, or SER), Nvidia claims to be able to achieve up to twice the efficiency in power efficiency. Especially when handling ray tracing, GPUs benefit from a streamlined throughput of complex calculations — SER automatically optimizes the delivery of such data to the graphics processor.
The new ray tracing clusters feature Nvidia’s third generation of RT cores, with twice the performance in select tasks surrounding light reflection. The matrix-based Tensor cores are also getting an upgrade with their fourth generation; the format has been learned from Nvidia’s Hopper processor. Potentially, the heaviest Ada Lovelace chips can spit out up to 1,300 teraflops in Tensor commands.
Roughly speaking, this means that Ada Lovelace should be twice as fast (as Ampere) in rasterized applications and four times as fast in ray tracing. That power was measured at roughly the same wattages, which is why Huang writes the new generation as “incredibly high-efficiency”. In contrast, there was zero statement about the maximum powers (TDP or TGP) for the first Ada Lovelace cards.
As expected, the successor of Ampere once again kicks off with the higher segment of video cards. In the previous generation this also included a GeForce RTX 3070; this time, the RTX xx70 model seems to be taking longer. Selected rumors about questionable RTX 4070 specs already predicted something similar.
The provisional flagship of the Ada Lovelace generation is again an RTX xx90 model, this time the RTX 4090. The GPU should be three to four times as powerful as the previously released RTX 3090 Ti, while both cards have the same 24 GB of GDDR6X. memory (21 Gbps, 384-bit).
The RTX 4090 runs on Nvidia’s heaviest AD102 GPU, with 16,384 CUDA cores at 2,520 MHz. Huang indicated that the GPU is easy to overclock to above 3.0 GHz. The chairman himself did not mince words, but previous leaks claim a minimum TGP of 450 watts, with peaks of up to approximately 660 watts.
The RTX 4090 will be officially released on October 12, with a suggested retail price of 1,949 euros.
The GeForce RTX 4080 is split this generation (once again) into two different models. The standard model has 12 GB of GDDR6X (21 Gbps, 192-bit), while a more luxurious edition comes with 16 GB of GDDR6X (22.5 Gbps, 256-bit). Both cards should be roughly two to four times as powerful as the RTX 3080 Ti from early this year.
The two variants are less identical than previously thought. The heavier model runs on the AD102-300 chipset; the lighter one will do with the AD104-400. This also includes different amount of streaming processors (76 versus 60) and CUDA cores (9728 versus 7680).
In this case too, Huang did not say anything about the alleged consumption of the cards, although recent leaks state that these would be standard TGPs of 320 watts and 285 watts; considerably less than some rumors previously suggested.
The RTX 4080 (16 GB) has a suggested retail price of 1,469 euros; the price of the lighter 12 GB model starts from 1,099 euros. Both models should be available by mid-November 2022.
As always, Nvidia also unveiled new technologies to put the new architecture on a pedestal. The new optical flow accelerators in the Tensor cores help make smart upscaling just that little bit smoother, pushing Nvidia’s Deep Learning Super Sampling (DLSS) into a third generation.
DLSS 3.0 promises to generate up to three times as many frames in upscaled 4k resolutions, compared to native 4k, in select games. That also applies in combination with Nvidia’s own ray tracing. For example, a Microsoft Flight Simulator draws frame rates of over 110 fps with ‘RTX On’, compared to 54 fps in native 4k without ray tracing.
In addition to a new DLSS, Ada Lovelace also introduces native coding to the new AV1 standard. Using the new Nvidia GPUs, video files and streams can be encoded in AV1, with higher image quality on smaller file sizes (than, for example, H.265). Nvidia is thus following Intel, who already included AV1 in their first Intel Arc GPUs.
A newer technology is RTX Remix, which allows mod makers to enrich old games (using USD recordings) relatively easily with ray tracing, AI-driven upscalers for textures and other new effects. The tool is offered right away in Nvidia’s Omniverse (famous for the Ampere reveal), shortly after Ada Lovelace appears.
As an example for the new Omniverse capabilities, Portal RTX was shown, a mod for the original PC version of Portal that bakes ray tracing right into the classic game. Portal RTX will be released in November as free downloadable content for gamers who already own the game. Ada Lovelace cards are probably not necessarily needed to run the mod.
Apart from game-focused disclosure, Nvidia also seems to be focusing this generation on AI, content creation, the metaverse and robotics. The lion’s share of the GTC presentation takes a closer look at how Nvidia’s most powerful GPUs accelerate everything from augmented reality for surgeons to self-driving cars on Nvidia’s new Thor processor, consisting of Nvidia’s proprietary Grace, Hopper and Ada architectures.