Nvidia RTX 3070, 3080 & 3090 Preview – Everything about the new generation Nvidia GeForce

At six o’clock tonight, Nvidia aired a pre-recorded event in which CEO Jensen Huang introduced the new Ampere video cards for gamers. A lot had already been leaked about the GeForce RTX 3070, RTX 3080 and RTX 3090, but Nvidia nevertheless managed to surprise on points – the new GPUs appear to have twice as many Cuda cores as previously suggested. In this preview you can read everything we know so far about Nvidia’s brand new gaming cards.

Nvidia GeForce RTX 3000: Ampere + 8nm

The main innovations in the GeForce RTX 3000 series are the Ampere architecture, which succeeds both the Volta architecture for servers and the Turing architecture for gaming, and Samsung’s 8nm process ‘co-developed with Nvidia’. This process is a further development of Samsung’s 10nm process, which means that no EUV is used in production yet.

The line-up: RTX 3070, RTX 3080 & RTX 3090

The line-up currently consists of three models, with the flagship GeForce RTX 3090 (nicknamed BFGPU – feel free to think for yourself what that stands for) based on the GA102 GPU. In fact, this is the successor to the RTX 2080 Ti (or even the Titan RTX, as Nvidia itself says), which contained the TU102 chip, but to simplify the naming, they simply chose a higher model number this time. The RTX 3080 is based on the same chip, but with 17% fewer computing units enabled. Only the RTX 3070 uses the GA104 chip – for comparison, the TU104 was already used for the RTX 2080 in the previous series.

RTX 3090RTX 2080 TiRTX 3080RTX 2080RTX 3070RTX 2070
Architecture8nm, GA10212nm, TU1028nm, GA10212nm, TU1048nm, GA10412nm, TU106
Cuda cores‘10496’4352‘8704’2944‘5888’2304
Boost speed1700MHz1635MHz1710MHz1800MHz1730MHz1710MHz
Vram24GB gddr6x11GB gddr610GB gddr6x8GB gddr68GB gddr68GB gddr6
Memory Speed19.5GBit / s14Gbit / s19Gbit / s14Gbit / s16Gbit / s14Gbit / s
Memory bus384bit352bit320bit256bit256bit256bit
Bandwidth936GB / s616GB / s760GB / s448GB / s512GB / s448GB / s
Tgp350W260W320W225W220W175W

Later in this article we will discuss the most important new features of the Ampere architecture, the GA102 and GA104 GPUs already mentioned and the design of Nvidia’s own Founders Edition. We must rely entirely on what Nvidia announced today; no samples have arrived in our lab yet and the extensive press briefings, which usually go deeper into the architecture, have not yet taken place. Fortunately, the first GeForce RTX 3000 video card will be in the shops on September 17, so it will not be too long to wait for all the details and of course extensive, independent benchmarks – but that does not mean that we are holding you back from our preliminary analysis.

Amps for gamers: 10,000+ cores?

The beating heart of RTX 3000 series graphics cards is of course the Ampere architecture. Not entirely new for the entire GPU market, because we saw Ampere previously in Tesla cards for servers, but Ampere is now coming out in GeForce products for the first time.

Improved sms, rt cores and tensor cores

The promises with the RTX 3090, 3080 and 3070 are great: up to twice the performance and 1.9 times the efficiency of Turing GPUs. The second generation RTX should achieve this through the enhancements that a new generation of sms, rt cores and tensor cores have received. With technical details, Nvidia was still sparse on Tuesday.

Two new GPUs: GA104 and GA102

We already mentioned that the three video cards that Nvidia announced today are based on two GPUs: GA104 and GA102, where the G and A respectively stand for GeForce and Ampere. There is an even bigger chip, the A100, which Nvidia does not yet use in consumer products, but which was already in the Tesla A100 earlier this year.

If we compare the specifications, we first notice that the GA102 GPU is a lot smaller than GA100: it contains almost half the number of transistors. This is partly because it contains fewer streaming multiprocessors (SMS), namely 82 instead of 108, but also because Nvidia has replaced the hbm2e memory controller with one that works with cheaper gddr6x memory, about which more later.

GA100GA102GA104TU102
Applied inTesla A100RTX 3080, RTX 3090RTX 3070RTX 2080 Ti
Manufacturing process7nm TSMC8nm Samsung8nm Samsung12nm TSMC
Transistors54 billion28 billionnnb18.6 billion
That size826 mm²627 mm²nnb754 mm²
Cuda cores69125248 *2944 *4352
TGP400W350W220W260W
MemoryHBM2egddr6xgddr6gddr6

Against all rumors and even specifications of video card manufacturers, Nvidia claims that the GA102 and GA104 GPUs contain twice as many cuda cores, namely 10496 pieces in GA102 and 5888 pieces in GA104. That is impossible in the conventional way: both in terms of tdp and that size as transistors, so many cuda cores do not fit in the mentioned GPUs.

The most important clue for what’s going on can be found on a specifications page that Nvidia put online after the stream. It states that the RTX 3000 series SMSs each contain two fp32 compute units, compared to one in all previous generations.

In the documentation of the A100-GPU we find a block diagram showing the internal structure of an Ampere-sm. For a total of 64 FP32 units, often simply referred to as shader units, there are four clusters each containing sixteen of those FP32 units in one SM. However, eight fp64 units have also been placed per cluster. Most likely, Nvidia has equipped those FP64 units with the option of also serving as a double FP32 unit, which effectively means that there are no 16 but 32 FP32 units per cluster, or 128 per SM.

If this hypothesis is correct, the Ampere GPUs for gamers have an enormous ditch of FP32 computing power. A doubling of the performance seems too optimistic, as other parts of the chip (dispatcher, scheduler, caches, memory bandwidth) will likely create a significant bottleneck, but we could potentially see some very impressive gaming performance.

 

Subscribe to our Newsletter

Loading...