Nvidia announces Tesla P100 accelerator with Pascal GPU and 16GB hbm2 memory

Spread the love

Nvidia has announced a new accelerator card for high performance computing. The Tesla P100 has a GP100 GPU that is based on the Pascal architecture and is intended for GPU applications such as deep learning and artificial intelligence development.

Nvidia CEO Jen-Hsun Huang announced the Tesla P100 at the GPU Technology Conference. The accelerator features the new GP100 GPU, which consists of 15.3 billion transistors. That is almost double the GM200 GPU of the Maxwell generation, which consists of up to 8 billion transistors. According to Huang, the chip took three years to work with, and research & development involved costs of two to three billion dollars.

Nvidia has not released anything about new video cards for consumers during the keynote. It is striking that the Tesla P100 uses 16GB of HBM2 memory, probably from Samsung. Recent rumors seemed to indicate that the new GeForce cards would get gddr5x memory. What is certain is that future GeForce video cards will use the same Pascal architecture as the Tesla P100.

The Tesla P100 also has 4MB L2 cache and 14MB sm rf, which can communicate with the chip at a speed of 80TB/s. The GP100 GPU is made on a 16nm finfet process and has a processing power of 5.3 teraflops at fp64 double precision computing. At fp32 single precision, it’s 10.6 teraflops, and at fp16, it increases to 21.2 teraflops.

The GP100 consists of a composite of graphics processing clusters, streaming multiprocessors and memory controllers. The chip has six GPCs, up to 60 SMS and eight 512-bit memory controllers, which equates to a memory bus of 4096 bits wide. Each Streaming Multiprocessor on the GPU has 64 cudacores and 4 texture units. Good for a total of 3840 cudacores and 240 texture units. The Tesla P100 has 3584 cores enabled. The GPU has a clock speed of 1328MHz with a boost clock of 1480MHz.

The Tesla accelerators are intended for business applications and the P100 is at the top of that segment. The chip is currently in mass production, and Nvidia says it will ship the first copies to large companies as soon as possible for use in their hyperscale data centers. Later, OEMs such as Dell, HP and IBM will have access to the accelerators so that they can build them into servers. Those servers with Tesla P100 will hit the market in the first quarter of 2017, according to Nvidia.

Nvidia itself releases the DGX-1, in its own words a ‘supercomputer’, which is equipped with eight Tesla P100 accelerators. The cards communicate with each other using the Nvidia nvlink interface, which offers five times the speed of PCI-e 3.0. A single node provides 170 teraflops of computing power with fp16 half-precision. With a rack full of these servers, 2 petaflops is possible. The Nvidia DGX-1, which costs $129,000, contains two Intel Xeon E5-2698 v3 processors, 512GB DDR4 RAM and four 1.92TB SSDs in a raid 0 setup. The first copies will be delivered to research departments of universities.

Tesla Accelerators Tesla K40 Tesla M40 Tesla P100
GPU GK110 (Kepler) GM200 (Maxwell) GP100 (Pascal)
Text message 15 24 56
TPCs 15 24 28
FP32 cuda cores / sm 192 128 64
FP32 Cuda Cores / GPU 2880 3072 3584
FP64 cuda cores / sm 64 4 32
FP64 Cuda Cores / GPU 960 96 1792
base clock 745MHz 948MHz 1328MHz
GPU boost clock 810/875MHz 1114MHz 1480MHz
single precision 4.3 tflops 7 tflops 10.6 tflops
Double precision 1.43 tflops 0.2 tflops 5.3 tflops
Texture Units 240 192 224
Memory interface 384-bit gddr5 384-bit gddr5 4096-bit hbm2
Memory size Up to 12GB Up to 24GB 16GB
L2 cache
1536KB 3072KB 4096KB
Register file size / sm 256KB 256KB 256KB
Registry file size / gpu 3840KB 6144KB 14336KB
tdp 235 Watts 250 Watts 300 Watts
Transistors 7.1 billion 8 billion 15.3 billion
Gpu die format 551mm² 601mm² 610mm²
Design Process 28nm 28nm 16nm

Comparison of the new GP100 GPU with the GM200 and GK-110 of the previous generations.

You might also like