NVIDIA Unveils DGX-2H Server with 450W Tesla V100 GPUs
by Anton Shilov on November 20, 2018 4:30 PM ESTNVIDIA has introduced a new version of its DGX-2 server that is outfitted with higher-performing CPUs and GPUs. The DGX-2H server is powered by 16 Tesla V100 GPUs that run at higher clocks and feature a 450 W TDP each. The whole system consumes up to 12 kW of power and delivers 2.1 PetaFLOPS of compute horsepower.
NVIDIA’s DGX-2H is an updated version of the DGX-2 machine the company introduced earlier this year. The new system is based on Intel’s two 24-core Intel Xeon Platinum 8174 processor accompanied by 1.5 TB of DDR4 memory, as well as 30 TB of NVMe storage. The key improvements of the new server versus the previous one are faster NVIDIA Tesla V100 GPUs featuring 512 GB of HBM2 memory in total. Meanwhile, the new DGX-2H similar networking capabilities: 10/25/40/50/100 GbE.
UPDATE 11/29: NVIDIA has reached out to clarify a number of data points regarding the DGX servers, so the story has been updated.
NVIDIA DGX Series (with Volta) | |||
DGX-2H | DGX-2 | DGX-1 | |
CPUs | 2 x Intel Xeon Platinum 8174 |
2 x Intel Xeon Platinum 8168 |
2 x Intel Xeon E5-2600 v4 |
GPUs | 16 x NVIDIA Tesla V100 32GB HBM2 (450 W) |
16 x NVIDIA Tesla V100 32GB HBM2 (350 W) |
8 x NVIDIA Tesla V100 32 GB HBM2 |
System Memory | Up to 1.5 TB DDR4 | Up to 0.5 TB DDR4 | |
GPU Memory | 512 GB HBM2 (16 x 32 GB) |
256 GB HBM (8 x 32 GB) |
|
Storage | 30 TB NVMe Up to 60 TB |
4 x 1.92 TB NVMe | |
Networking | 8 x Infiniband or Dual 100 GbE |
8 x Infiniband or Dual 100 GbE |
4 x IB + 2 x 10 GbE |
Power | 12 kW | 10 kW | 3.5 kW |
Size | 360 lbs | 360 lbs | 134 lbs |
GPU Throughput | Tensor: 2100 TFLOPs FP16: ? TFLOPs FP32: ? TFLOPs FP64: ? TFLOPs |
Tensor: 1920 TFLOPs FP16: 480 TFLOPs FP32: 240 TFLOPs FP64: 120 TFLOPs |
Tensor: 960 TFLOPs FP16: 240 TFLOPs FP32: 120 TFLOPs FP64: 60 TFLOPs |
Cost | ? | $399,000 | $149,000 |
Thanks to faster graphics processors with a 450 W TDP each, the system now can deliver 2.1 PFLOPS of compute performance, up from 2 PFLOPS before. Meanwhile, in a bid to increase power, it looks like NVIDIA had to switch to a new cooling method. ServeTheHome believes that NVIDIA also uses a new cooling subsystem as the DGX-2H weighs 20 pounds more than its predecessor (360 pounds vs. 340 pounds), though the company has not confirmed this. Along with performance improvements NVIDIA had to decrease maximum operating temperature of the DGX-2H from 35C to 25C.
NVIDIA has not disclosed pricing of the DGX-2H, though it is likely that it will cost more than $399,000, the price of the DGX-2. What remains to be seen is whether NVIDIA customers find the DGX-2H performance good enough for extra 2 kW of power consumption.
Related Reading:
- NVIDIA’s DGX-2: Sixteen Tesla V100s, 30 TB of NVMe, only $400K
- NVIDIA Unveils & Gives Away New Limited Edition 32GB Titan V "CEO Edition"
- GIGABYTE Launches Two 4U NVIDIA Tesla GPU Servers: High Density for Deep Learning
Sources: NVIDIA, ServeTheHome
14 Comments
View All Comments
Kevin G - Tuesday, November 20, 2018 - link
The change in weight and cooling spec makes me wonder if they included a liquid cooling system internally.DanNeely - Wednesday, November 21, 2018 - link
I doubt it. The weight increase would only allow ~1kg for each CPU/GPU's share of waterblock, radiator, and coolant. It's a server setup, so the air cooled version would be relatively small heatsinks with massive wear hearing protection levels of case level airflow; so dropping the air cooling heatsinks doesn't free up much additional weight.MrSpadge - Tuesday, November 20, 2018 - link
The maximum GPU memory of the DGX-1 should be 8 x 16 GB = 128 GB, shouldn't it?plopke - Tuesday, November 20, 2018 - link
the data sheet it says "GPU Memory 256 GB total system" ,but when I open the white paper of DGX-1 it says "he eight Tesla V100 GPUs have a total of 128 GB HBM2 memory"
Maybe part of system memory is reserved for the GPU?
Eric Klien - Wednesday, November 21, 2018 - link
The original DGX-1 had 128 GB while the latest DGX-1 has 256 GB as the memory per GPU has doubled. So this chart should be fixed showing that each GPU has 32 GB in all 3 systems. I believe you can still buy the original DGX-1 for a mere $129,000.Charlie22911 - Tuesday, November 20, 2018 - link
Maximum operating temperature of 25c?! Is that normal for systems like this? Why so low?jimjamjamie - Tuesday, November 20, 2018 - link
There's 16x 450W GPUs in that box. If you're going to spend half a million bucks on something like this, you should probably get some nice AC to stop it from going nuclear when you try and run minecraft.Death666Angel - Wednesday, November 21, 2018 - link
For AC controlled server rooms, that seems quite high, at least compared to the ones I know. You don't want to bake your millions of dollars worth of computer equipment anyway.mode_13h - Tuesday, November 20, 2018 - link
I'm actually more impressed they doubled the tensor throughput simply by going to 350 W. The extra bump from going to 450 W isn't worth it, IMO.Santoval - Tuesday, November 20, 2018 - link
You misread the specs. They did not double the tensor (along with the FP16/32/64) performance of the DGX-1 by raising the TDP of the DGX-2 graphics cards but by doubling their number. Since the numbers exactly doubled we can safely assume that the TDP of the DGX-1 and DGX-2 Tesla V100s is exactly the same (350W).