Nvidia Announces Ampere Architecture for GeForce and Tesla

Nvidia has announced the A100, the first GPU that the company produces on its new Ampere architecture. The chip is the first to come to a DGX system with eight A100 GPUs. GeForce cards also get a GPU based on Ampere.

Ampere should eventually replace not only Volta but Turing as well, serving as a single platform for both the enterprise and consumer cards, Nvidia CEO Jensen Huang said ahead of the announcement, according to Marketwatch. Volta is the architecture of the Tesla V100 accelerator GPU; GeForce cards based on Volta never appeared. The GeForce 20 cards are based on the Turing architecture. About Ampere for GeForce, Huang said nothing further, only that there will be a lot of overlap with Ampere for Tesla but with other configurations.

The first GPU based on Ampere is the Tesla A100 and it is intended for high performance computing, artificial intelligence and other data center applications. This chip allows Nvidia to produce at 7nm and contains 54 billion transistors. The surface of the die is 826mm². This means that the number of transistors has increased considerably compared to the GV100 GPU of the Tesla V100, which has 21.1 billion transistors, while the chip surface is not much larger: the GV100 measures 815 mm².

The number of cudacores of the A100 has increased from 5120 to 6912 over the V100. The number of tensor cores has decreased from 640 to 432, but these are third-generation tensor cores that are improved over the previous generation, according to Nvidia . In fp64 calculations, these offer more than twice the performance. With fp32 calculations, that would even be a tenfold increase, but Nvidia here compares calculations based on its own tensor float 32 with floating point 32 calculations. According to Nvidia, “tf32 works just like fp32 without having to change any code.”

The memory bus of the A100 is 5120 bits wide and the maximum memory bandwidth is 1555GB/s. The accelerator has 40MB of on-chip level cache, seven times more than the previous generation, and can have 40GB of vram spread over six hbm2e stacks.

Also new is the presence of multi instance GPU for virtualization. Each A100 can thus be divided into up to seven instances, each of which can work isolated and with its own memory for different users. In addition, there is support for a new nv-link interconnect to connect GPUs in a server. It offers a GPU-to-GPU bandwidth of 600GB/s.

Nvidia immediately announced a first system with the A100: the DGX A100. This contains eight A100 accelerators with a total of 320GB memory and this system is also equipped with 200Gbit/s interconnects from Mellanox acquired by Nvidia. It is striking that Nvidia has made the switch from Intel to AMD: the previous DGX-2 had two Intel Xeon Platinum 8168 processors. The manufacturer plans to offer the DGX A100 bundled in a cluster of 140 systems in the form of the so-called DGX SuperPOD.

Nvidia Tesla Series

Tesla A100 Tesla V100s Tesla V100 Tesla P100

GPU

Die surface

Transistors

Text message

Cudacores

tensor cores

FP16 Compute

FP32 Compute

FP64 Compute

Boost clock speed

Max. avg. bandbr.

Eff. avg. clocksn.

Memory

Memory interface

tdp

form factor

7nm GA100 12nm GV100 12nm GV100 16nm GP100

826mm²

815mm²

815mm²

610mm²

54 billion

21.1 billion

21.1 billion

15.3 billion

108

80

80

56

6912

5120

5120

3840

432

640

640

AFTER

78 tflops

32.8 tflops

31.4 tflops

21.2 tflops

19.5 tflops

16.4 tflops

15.7 tflops

10.6 tflops

9.7 tflops

8.2 tflops

7.8 tflops

5.3 tflops

~1410MHz

~1601MHz

~1533MHz

~1480MHz

1555GB/s

1134GB/s

900GB/s

721GB/s

2430MHz

2214MHz

1760MHz

1408MHz

40GB HBM2e

32GB HBM2

16GB / 32GB HBM2

16GB HBM2

5120-bit

4096-bit

4096-bit

4096-bit

400

250W

300W

300W

SXM4/pci-e 4.0 pci e 3.0 SXM2/pci-e 3.0 SXM
Comments
Loading...