Assessing Cavium's ThunderX2: The Arm Server Dream Realized At Last
by Johan De Gelas on May 23, 2018 9:00 AM EST- Posted in
- CPUs
- Arm
- Enterprise
- SoCs
- Enterprise CPUs
- ARMv8
- Cavium
- ThunderX
- ThunderX2
Sizing Things Up: Specifications Compared
Thirty-two high-IPC cores in one package sounds promising. But how does the best ThunderX2 compare to what AMD, Qualcomm and Intel have to offer? In the table below we compare the high level specifications of several top server SKUs.
Comparison of Major Server SKUs | |||||
AnandTech.com | Cavium ThunderX2 9980-2200 |
Qualcomm Centriq 2460 |
Intel Xeon 8176 |
Intel Xeon 6148 |
AMD EPYC 7601 |
Process Technology | TSMC 16 nm |
Samsung 10 nm |
Intel 14 nm |
Intel 14 nm |
Global Foundries 14 nm |
Cores | 32 Ring bus |
48 Ring bus |
28 Mesh |
20 Mesh |
4 dies x 8 cores MCM |
Threads | 128 | 48 | 56 | 40 | 64 |
Max. number of sockets | 2 | 1 | 8 | 4 | 2 |
Base Frequency | 2.2 GHz | 2.2 GHz | 2.2 GHz | 2.4 GHz | 2.2 GHz |
Turbo Frequency | 2.5 GHz | 2.6 GHz | 3.8 GHz | 3.7 GHz | 3.2 GHz |
L3 Cache | 32 MB | 60 MB | 38.5 MB | 27.5 MB | 8x8 MB |
DRAM | 8-Channel DDR4-2667 |
6-Channel DDR4-2667 |
6-Channel DDR4-2667 |
6-Channel DDR4-2667 |
8-Channel DDR4-2667 |
PCIe 3.0 lanes | 56 | 32 | 48 | 48 | 128 |
TDP | 180W | 120 W | 165W | 150W | 180W |
Price | $1795 | $1995 | $8719 | $3072 | $4200 |
Astute readers will quickly remark that Intel's top of the line CPU is the Xeon Platinum 8180. However that SKU with its 205W TDP and $10k+ price tage is not comparable at all to any CPU in the list. We are already going out on a limb by including the 8176, which we feel belongs in this list of maximum core/thread count SKUs. In fact, as we will see further, Cavium positions the Cavium 9980 as "comparable" to the Xeon Platinum 8164, which is essentially the same part as the 8176 but with slightly lower clockspeeds.
However, it terms of performance per dollar, Cavium typically compares their flagship 9980 to the Intel Xeon Gold 6148, against which the pricing of Cavium's CPU is very aggressive. Many of Cavium's benchmarks claim that the fastest ThunderX2 is 30% to 40% ahead of the Xeon 6148, all the while Cavium's offering comes in at $1300 less. That aggressive pricing might explain the increasingly persistent rumors that Qualcomm is not going to enter the server market after all.
When looking at the table above, you can already see some important differences between the contenders. Intel seems to have the most advanced core topology and the highest turbo clockspeed. Meanwhile Qualcomm has the best chances when it comes to performance per watt, and has already published some benchmarking data that underlines this advantage.
Similar to AMD's EPYC, Cavium's ThunderX2 is likely to shine in the "sparse matrix" HPC market. This is thanks to its 33% greater theoretical memory bandwidth and a high core/thread count. However as we've seen in the case of AMD's design, EPYC's L3-cache is slow once you need data that is not in the local 8 MB cache slice. The ThunderX2, by comparison, is a lot more sophisticated with a dual ring architecture, which seems to be similar to the ring architecture of the Xeon v4 (Broadwell-EP). According to Cavium, this ring structure is able to offer up to 6 TB/s of bandwidth and is non-blocking.
This ring architecture is connected to Cavium's Coherent Processor Interconnect (CCPI2 - at the top of the picture), which runs at 600 Gb/sec. This interconnect links the two sockets/NUMA nodes. Also connected to the ring are the SoC's 56 PCIe 3.0 lanes, which Cavium allocates among 14 PCIe "controllers.". These 14 controllers can, in turn, be bifurcated down to x4 or x1 as you can see below.
SR-IOV, which is important for I/O virtualization (Xen and KVM), is also supported.
97 Comments
View All Comments
imaheadcase - Sunday, May 27, 2018 - link
Yah i tried that for a bit, it worked ok. But was not foolproof, it missed some stuff.repoman27 - Wednesday, May 23, 2018 - link
Just to provide a counter point, this article made my day. And that’s coming entirely from intellectual curiosity—I don’t plan on deploying any servers with these chips in the near future. I always enjoy Johan’s writing, and was really looking forward to seeing how ThunderX2 would stack up. Many people are convinced that ARM is really only suitable in low power / mobile scenarios, but this is the chip that may finally prove otherwise. That has significant ramifications for the entire industry (including the consumer space), especially when you consider that Cavium could put out a TSMC 10nm or even 7nm shrink of ThunderX2 before Intel can get off of 14nm.HStewart - Wednesday, May 23, 2018 - link
This does not proved that ARM is suitable in higher end space - look at the core specific speed - it extremely low compare to Intel and AMD server chips. Keep in mind it takes 128 total cores - running at 4SMT system. And what about other operations - what about Virtual Machine situation - where you have many virtual x86 machines on VMWare server,How about high end mathematical and vector logic?
It does seem like ARM can run more threads - but maybe Intel or AMD has never had the need to
I think this latest Core battle is silly - I think it really not the number of cores you have but combination of type and speed of cores along with number of cores.
Wilco1 - Wednesday, May 23, 2018 - link
It certainly does prove that Arm can do high end servers - the results clearly show IPC/GHz is very close on SPECINT. Base clock speeds are the same as the Intel cores, and that's the speed the server runs at when not idle. But there are more cores as you say, so who will win is obvious.Now imagine a next-gen 7nm version before Intel manages 10nm. Not a pretty picture, right?
HStewart - Wednesday, May 23, 2018 - link
Ok I have learn to agree to disagree with some peopleCan this server run the VMWare server
https://kb.vmware.com/s/article/1003882
The answer is no - just one example - many more,
On 10nm - it not number that matters - it technology behind it - Intel supposely has a i3 and Y based for CannonLake coming this year - probably more.
Wilco1 - Wednesday, May 23, 2018 - link
There are plenty of VMs for Arm, so virtualization is not an issue.10nm will be behind 7nm even if it ends up as originally promised and not using relaxed rules to become viable for volume production.
ZolaIII - Thursday, May 24, 2018 - link
When optimized for SIMD NEON extension things changed dramatically. All tho NEON isn't exactly the best SIMD never the less number's speak for them self.https://blog.cloudflare.com/neon-is-the-new-black/
Tho Centriq is a bit pricier, bit overly slower than this but main point is it whose built on comparable lithography to current Intel's 14nm. So you get cheaper hardware, which can be packaged tighter & will consume much less power while being compatible regarding the performance. Triple win situation (initial cost, cost of ownership and scaling) but it still isn't turn key one whit isn't crucial for big vendor server farms anyway.
name99 - Thursday, May 24, 2018 - link
ARM (and this particular chip) aren't trying to solve every problem in the world. They're trying to offer a better (cheaper) solution for a PARTICULAR subset of customers.If you think such customers don't exist, then why do you think Intel has such a wide range of Xeons, including eg all those Xeon Silvers that only turbo up to 3GHz? Or Xeon Gold's that max out at 2.8GHz?
lmcd - Thursday, May 24, 2018 - link
Second page: supports SR-IOV, which is important for KVM and Xen. If you're not aware, Xen and KVM are powerful virtualization solutions that cover the feature set of VMWare quite nicely.HStewart - Wednesday, May 23, 2018 - link
"I really think Anandtech needs to branch into different websites. Its very strange and unappealing to certain users to have business/consumer/random reviews/phone info all bunched together."I different in this - I don't think AnandTech should concentrate on just gaming in focus - this is rather old school - I am not sure about mobile phones in the mess of all this
But comparing ARM cpu's to Intel/AMD is interesting subject. It basically RISC vs CISC discussion - yes RISC can do operations quicker in some cases - but by definition of the architecture they are Reduce in what they do. Fox example it would take RISC a ton of instructions to executed a single AVX style operation.
This article is closest I have seen in comparing ARM vs x86 base machines - but even though I see some holes - it comes close - but having just be Linux based leaves out why people purchase such machine - I think Virtual Machine server is huge - but like everything else on the internet that is just an opinion