Nvidia RTX 3070, 3080 & 3090 Preview – Everything about the new generation Nvidia GeForce

At six o’clock tonight, Nvidia aired a pre-recorded event in which CEO Jensen Huang introduced the new Ampere video cards for gamers. A lot had already been leaked about the GeForce RTX 3070, RTX 3080 and RTX 3090, but Nvidia nevertheless managed to surprise on points – the new GPUs appear to have twice as many Cuda cores as previously suggested. In this preview you can read everything we know so far about Nvidia’s brand new gaming cards.

Nvidia GeForce RTX 3000: Ampere + 8nm

The main innovations in the GeForce RTX 3000 series are the Ampere architecture, which succeeds both the Volta architecture for servers and the Turing architecture for gaming, and Samsung’s 8nm process ‘co-developed with Nvidia’. This process is a further development of Samsung’s 10nm process, which means that no EUV is used in production yet.

The line-up: RTX 3070, RTX 3080 & RTX 3090

The line-up currently consists of three models, with the flagship GeForce RTX 3090 (nicknamed BFGPU – feel free to think for yourself what that stands for) based on the GA102 GPU. In fact, this is the successor to the RTX 2080 Ti (or even the Titan RTX, as Nvidia itself says), which contained the TU102 chip, but to simplify the naming, they simply chose a higher model number this time. The RTX 3080 is based on the same chip, but with 17% fewer computing units enabled. Only the RTX 3070 uses the GA104 chip – for comparison, the TU104 was already used for the RTX 2080 in the previous series.

RTX 3090 RTX 2080 Ti RTX 3080 RTX 2080 RTX 3070 RTX 2070
Architecture 8nm, GA102 12nm, TU102 8nm, GA102 12nm, TU104 8nm, GA104 12nm, TU106
Cuda cores ‘10496’ 4352 ‘8704’ 2944 ‘5888’ 2304
Boost speed 1700MHz 1635MHz 1710MHz 1800MHz 1730MHz 1710MHz
Vram 24GB gddr6x 11GB gddr6 10GB gddr6x 8GB gddr6 8GB gddr6 8GB gddr6
Memory Speed 19.5GBit / s 14Gbit / s 19Gbit / s 14Gbit / s 16Gbit / s 14Gbit / s
Memory bus 384bit 352bit 320bit 256bit 256bit 256bit
Bandwidth 936GB / s 616GB / s 760GB / s 448GB / s 512GB / s 448GB / s
Tgp 350W 260W 320W 225W 220W 175W

Later in this article we will discuss the most important new features of the Ampere architecture, the GA102 and GA104 GPUs already mentioned and the design of Nvidia’s own Founders Edition. We must rely entirely on what Nvidia announced today; no samples have arrived in our lab yet and the extensive press briefings, which usually go deeper into the architecture, have not yet taken place. Fortunately, the first GeForce RTX 3000 video card will be in the shops on September 17, so it will not be too long to wait for all the details and of course extensive, independent benchmarks – but that does not mean that we are holding you back from our preliminary analysis.

Amps for gamers: 10,000+ cores?

The beating heart of RTX 3000 series graphics cards is of course the Ampere architecture. Not entirely new for the entire GPU market, because we saw Ampere previously in Tesla cards for servers, but Ampere is now coming out in GeForce products for the first time.

Improved sms, rt cores and tensor cores

The promises with the RTX 3090, 3080 and 3070 are great: up to twice the performance and 1.9 times the efficiency of Turing GPUs. The second generation RTX should achieve this through the enhancements that a new generation of sms, rt cores and tensor cores have received. With technical details, Nvidia was still sparse on Tuesday.

Two new GPUs: GA104 and GA102

We already mentioned that the three video cards that Nvidia announced today are based on two GPUs: GA104 and GA102, where the G and A respectively stand for GeForce and Ampere. There is an even bigger chip, the A100, which Nvidia does not yet use in consumer products, but which was already in the Tesla A100 earlier this year.

If we compare the specifications, we first notice that the GA102 GPU is a lot smaller than GA100: it contains almost half the number of transistors. This is partly because it contains fewer streaming multiprocessors (SMS), namely 82 instead of 108, but also because Nvidia has replaced the hbm2e memory controller with one that works with cheaper gddr6x memory, about which more later.

GA100 GA102 GA104 TU102
Applied in Tesla A100 RTX 3080, RTX 3090 RTX 3070 RTX 2080 Ti
Manufacturing process 7nm TSMC 8nm Samsung 8nm Samsung 12nm TSMC
Transistors 54 billion 28 billion nnb 18.6 billion
That size 826 mm² 627 mm² nnb 754 mm²
Cuda cores 6912 5248 * 2944 * 4352
TGP 400W 350W 220W 260W
Memory HBM2e gddr6x gddr6 gddr6

Against all rumors and even specifications of video card manufacturers, Nvidia claims that the GA102 and GA104 GPUs contain twice as many cuda cores, namely 10496 pieces in GA102 and 5888 pieces in GA104. That is impossible in the conventional way: both in terms of tdp and that size as transistors, so many cuda cores do not fit in the mentioned GPUs.

The most important clue for what’s going on can be found on a specifications page that Nvidia put online after the stream. It states that the RTX 3000 series SMSs each contain two fp32 compute units, compared to one in all previous generations.

In the documentation of the A100-GPU we find a block diagram showing the internal structure of an Ampere-sm. For a total of 64 FP32 units, often simply referred to as shader units, there are four clusters each containing sixteen of those FP32 units in one SM. However, eight fp64 units have also been placed per cluster. Most likely, Nvidia has equipped those FP64 units with the option of also serving as a double FP32 unit, which effectively means that there are no 16 but 32 FP32 units per cluster, or 128 per SM.

If this hypothesis is correct, the Ampere GPUs for gamers have an enormous ditch of FP32 computing power. A doubling of the performance seems too optimistic, as other parts of the chip (dispatcher, scheduler, caches, memory bandwidth) will likely create a significant bottleneck, but we could potentially see some very impressive gaming performance.