Hot Chips 31 Live Blogs: IBM's Next Generation POWER
by Dr. Ian Cutress on August 19, 2019 12:00 PM EST12:34PM EDT - We're here at Hot Chips 31 / 2019, and the first talk to be live blogged is IBM's newest variant of its POWER CPUs.
12:37PM EDT - Quite possibly the biggest Hot Chips crowd I can remember.
12:45PM EDT - The Arm talk is set to finish here in a bit, then IBM will start
12:45PM EDT - We already covered Arm's Neoverse N1 strategy earlier in the year: https://www.anandtech.com/show/13959/arm-announces-neoverse-n1-platform
12:55PM EDT - Just finishing up the previous talk
12:57PM EDT - Hopefully this is about POWER10 :)
12:57PM EDT - It could be the Power9 IO chip
12:58PM EDT - 2018 talk was about Power9 SU core
12:58PM EDT - IBM now has family of processors. Start with some one up front, and work on the rest of the family
12:58PM EDT - Scale out first, then scale up
12:58PM EDT - One optimized for dual socket, one optimized for 16 sockets
12:59PM EDT - Power9 AIO does things they wanted to do before power 10
12:59PM EDT - new accelerator technology deployed on Power9
12:59PM EDT - Today in Power9
12:59PM EDT - Power10 for 2021
12:59PM EDT - New core on Power10 and new transistor technology in 2021
01:00PM EDT - Accessing heterogenous systems
01:00PM EDT - Need to focus on diverse acceleration devices and diverse memory devices beyond CPUs
01:01PM EDT - Need to focus on heterogenous systems, not just GHz
01:01PM EDT - Need to deploy different types of hetergeneous systems
01:01PM EDT - Trying to remove the different types of SerDes on a chip. Want to consolodate these down to a single design
01:02PM EDT - On Power9, now only have two types of SerDes. PCIe and everything else is built on 25G SerDes
01:02PM EDT - SerDes can make something area and power efficient when fixed to 25G, then just scale the number of links
01:02PM EDT - Take all the 25G signals from the chip and deploy composable systems across different accelerator technologies
01:03PM EDT - NVLINK and OpenCAPI and OMI
01:03PM EDT - OMI is the memory interface to connect memory across SerDes
01:04PM EDT - On-chip Gzip accelerator
01:04PM EDT - IBM has delivered #1 and #2 supercomputers on the list
01:04PM EDT - Built for the AI era
01:05PM EDT - Now OpenCAPI, IBM sees it as being very important in future accelerator systems
01:05PM EDT - Minimizing overhead and latency that PCIe has
01:05PM EDT - Accelerators not only GPU, but SmartNICs, networking, FPGAs, AI accel
01:06PM EDT - Want software to take data from anywhere in the system on any device
01:06PM EDT - (some of the images here look low quality - click through to see full quality)
01:06PM EDT - Power9 has direct attached memory
01:07PM EDT - Some of the former secret sauce technologies are in the new open memory standard
01:07PM EDT - Can deal with asymmetry
01:08PM EDT - Having this connectivity allows for independent development of accelerators rather than focusing on the CPU
01:09PM EDT - Don't want programmers to worry about host-to-device connectivity
01:09PM EDT - Also OpenCAPI helps with security
01:09PM EDT - Prevents an accelerator crashing a whole system
01:10PM EDT - Need to make sure accelerators can't add in potential cache coherent bugs
01:11PM EDT - Aligned all packers with deserialised interface
01:11PM EDT - Accelerators always see aligned data to help make assumptions for performance
01:11PM EDT - Can start processing the command before checking the CRC
01:12PM EDT - Separately pipelined control/tag vs data
01:13PM EDT - (coherence over switching is not supported in OpenCAPI due to complexity)
01:14PM EDT - 1/6th the cost in die area to put OMI instead of DDR
01:14PM EDT - So memory is easier to support
01:14PM EDT - Can enable more bandwidth in smaller ASICs with OMI
01:15PM EDT - Differential buffer attach is now agnostic - the buffer is on the memory
01:15PM EDT - Can put buffered DDR or GDDR, rather than one or the other
01:16PM EDT - OMI is lighter weight and open to enable more ecosystem support
01:17PM EDT - With OMI memory, based on OpenCAPI SerDes, can use multiple DDR4 and DDR5 on the same system with the same connector
01:18PM EDT - e.g. if enabled on AMD sIOD, would decouple memory technology from host silicon development
01:19PM EDT - Power9 Advanced IO chip = P9 AIO
01:19PM EDT - 728mm2, 8B transistors
01:19PM EDT - 24 SMT4 cores, 120 MB eDRAM L3
01:19PM EDT - Built on 14FF (GF?)
01:19PM EDT - 17 layer metal stack
01:19PM EDT - 16 channels of x8 OMI, 650 GB/s peak r/w bandwidth
01:20PM EDT - 48 lanes of PCIe 4.0
01:20PM EDT - Up to x16 CAPI 2.0
01:20PM EDT - Up to x48 NVLINK attach
01:20PM EDT - Shows 2S replacement, but can scale to 16 socket
01:21PM EDT - OpenCAPI 4.0
01:21PM EDT - support for 64/128/256B cache lines
01:21PM EDT - supports 128B messages for low latency
01:22PM EDT - Supports virtual address cache for system memory
01:22PM EDT - Host manages the higher level cache coherency
01:23PM EDT - P9 SU supports 4xDDR4, P9 SO supports 4x Centaur, P9 AIO supports 8x OMI
01:23PM EDT - On each side
01:24PM EDT - OMI DDIMM looks very different
01:24PM EDT - Will see if I can get a better photo
01:25PM EDT - Microchip SMC1000 chip used on the OMI DDIMM
01:25PM EDT - effective bandwidth and latency equivalent to LRDIMM
01:26PM EDT - Q: energy per bit on memory vs DDR?
01:27PM EDT - A: Don't have numbers here. We shifted power from the DDR PHY onto the memory DIMM which helps with cooling conditions. The 8 lane memory device can move to 2 lane or 4 lane depending on use. It does dynamically shift based on utilization. Better than DDR anywya
01:28PM EDT - Q: Does the OMI DDIMM have a cache? A: No, it's a slimmer device with write buffering no caching
01:29PM EDT - Q: Is OMI like CXL? A: Viewing CXL is focused more on accelerators. OMI is available today and ahead of the competition and been in development a long time. I'd be surprised if other buffered memory solutions get as low latency as us. I'd be surprised if CXL has such a low latency to memory
01:30PM EDT - That's it for this talk. Small break now, next talk for live blogging is MLperf
17 Comments
View All Comments
peevee - Monday, August 19, 2019 - link
And people care less and less. Price/perf of POWER + AIX compared to EPYC and Linux is simply nowhere close. Only corruption (kickbacks) keeps it alive.extide - Monday, August 19, 2019 - link
At least they are still innovating. With basically no competition in that space besides their own older products -- they still come out with new stuff that's fun to read about -- unlike Intel....HStewart - Monday, August 19, 2019 - link
I disagree with this statement, in some ways IBM is like Intel, similar to Intel being stuck with old compatibly which make it struggle to advance the technology, but Intel does what it can by mobilizing the cpu and introducing new technology like AVX and AVX 512.But going multiple cores, is just a band aid in my opinion. what needs to happen is advance the core from within. But likely similar to IBM, doing so introduces software compatibility issues and it hard to advance software from tons of software.
If you remember when 64 bit came out, Intel want to people to I64 even though extending x86 is a much simple technique and could have be done Intel wanted to break software compatible with x86 but people did not buy into it.
But IBM has been known to mess up the industry, I had a personal professional experience, back about 27 years ago, I work as lead developer on PC-MOS/386 and I was working Windows /386 compatibility and I wanted work on DPMI support but Windows would not allow it. I did a message on Microsoft Composerv forum and got the following response "Not because of your company but another company, only people that can respond to your question is Bill Gates or Steve Balmer" This of course is IBM with OS/2 - Windows did not allow it's self to be DPMI client.
This is not the first time IBM has messed up, they required second source CPU compatibility for Microsoft in original IBM PC - which started for good or bad Clones pcs and eventually IBM try to change with PS/2 line but eventually only thing came out is 3 1/2 in discs which are no longer needed
levizx - Monday, August 19, 2019 - link
Or maybe you can offer free software update to lure them to x86?zdz - Monday, August 19, 2019 - link
AIX? Meh. Think POWER + Linux.Ninhalem - Monday, August 19, 2019 - link
You're forgetting about the supercomputing world where IBM is still very much alive and kicking butt. Summit at the Oak Ridge National Laboratory is the fastest supercomputer in the world and uses POWER9 CPU's coupled with Nvidia Tesla GPU's.HStewart - Monday, August 19, 2019 - link
It kind of funny how all the companies are saying they have the world fastest super computer,Intel and AMD and now IBM.
But things have change because of remember when x86 was just for home and IBM was mainframe
mode_13h - Monday, August 19, 2019 - link
They need a lot more volume than that, to stay viable.Threska - Monday, August 19, 2019 - link
Wonder if these non-caring "people" are already vested in x86?Phynaz - Monday, August 19, 2019 - link
These aren’t PCs. Lots of people care, including super computing centers.