12:34PM EDT - We're here at Hot Chips 31 / 2019, and the first talk to be live blogged is IBM's newest variant of its POWER CPUs.

12:37PM EDT - Quite possibly the biggest Hot Chips crowd I can remember.

12:45PM EDT - The Arm talk is set to finish here in a bit, then IBM will start

12:45PM EDT - We already covered Arm's Neoverse N1 strategy earlier in the year: https://www.anandtech.com/show/13959/arm-announces-neoverse-n1-platform

12:55PM EDT - Just finishing up the previous talk

12:57PM EDT - Hopefully this is about POWER10 :)

12:57PM EDT - It could be the Power9 IO chip

12:58PM EDT - 2018 talk was about Power9 SU core

12:58PM EDT - IBM now has family of processors. Start with some one up front, and work on the rest of the family

12:58PM EDT - Scale out first, then scale up

12:58PM EDT - One optimized for dual socket, one optimized for 16 sockets

12:59PM EDT - Power9 AIO does things they wanted to do before power 10

12:59PM EDT - new accelerator technology deployed on Power9

12:59PM EDT - Today in Power9

12:59PM EDT - Power10 for 2021

12:59PM EDT - New core on Power10 and new transistor technology in 2021

01:00PM EDT - Accessing heterogenous systems

01:00PM EDT - Need to focus on diverse acceleration devices and diverse memory devices beyond CPUs

01:01PM EDT - Need to focus on heterogenous systems, not just GHz

01:01PM EDT - Need to deploy different types of hetergeneous systems

01:01PM EDT - Trying to remove the different types of SerDes on a chip. Want to consolodate these down to a single design

01:02PM EDT - On Power9, now only have two types of SerDes. PCIe and everything else is built on 25G SerDes

01:02PM EDT - SerDes can make something area and power efficient when fixed to 25G, then just scale the number of links

01:02PM EDT - Take all the 25G signals from the chip and deploy composable systems across different accelerator technologies

01:03PM EDT - NVLINK and OpenCAPI and OMI

01:03PM EDT - OMI is the memory interface to connect memory across SerDes

01:04PM EDT - On-chip Gzip accelerator

01:04PM EDT - IBM has delivered #1 and #2 supercomputers on the list

01:04PM EDT - Built for the AI era

01:05PM EDT - Now OpenCAPI, IBM sees it as being very important in future accelerator systems

01:05PM EDT - Minimizing overhead and latency that PCIe has

01:05PM EDT - Accelerators not only GPU, but SmartNICs, networking, FPGAs, AI accel

01:06PM EDT - Want software to take data from anywhere in the system on any device

01:06PM EDT - (some of the images here look low quality - click through to see full quality)

01:06PM EDT - Power9 has direct attached memory

01:07PM EDT - Some of the former secret sauce technologies are in the new open memory standard

01:07PM EDT - Can deal with asymmetry

01:08PM EDT - Having this connectivity allows for independent development of accelerators rather than focusing on the CPU

01:09PM EDT - Don't want programmers to worry about host-to-device connectivity

01:09PM EDT - Also OpenCAPI helps with security

01:09PM EDT - Prevents an accelerator crashing a whole system

01:10PM EDT - Need to make sure accelerators can't add in potential cache coherent bugs

01:11PM EDT - Aligned all packers with deserialised interface

01:11PM EDT - Accelerators always see aligned data to help make assumptions for performance

01:11PM EDT - Can start processing the command before checking the CRC

01:12PM EDT - Separately pipelined control/tag vs data

01:13PM EDT - (coherence over switching is not supported in OpenCAPI due to complexity)

01:14PM EDT - 1/6th the cost in die area to put OMI instead of DDR

01:14PM EDT - So memory is easier to support

01:14PM EDT - Can enable more bandwidth in smaller ASICs with OMI

01:15PM EDT - Differential buffer attach is now agnostic - the buffer is on the memory

01:15PM EDT - Can put buffered DDR or GDDR, rather than one or the other

01:16PM EDT - OMI is lighter weight and open to enable more ecosystem support

01:17PM EDT - With OMI memory, based on OpenCAPI SerDes, can use multiple DDR4 and DDR5 on the same system with the same connector

01:18PM EDT - e.g. if enabled on AMD sIOD, would decouple memory technology from host silicon development

01:19PM EDT - Power9 Advanced IO chip = P9 AIO

01:19PM EDT - 728mm2, 8B transistors

01:19PM EDT - 24 SMT4 cores, 120 MB eDRAM L3

01:19PM EDT - Built on 14FF (GF?)

01:19PM EDT - 17 layer metal stack

01:19PM EDT - 16 channels of x8 OMI, 650 GB/s peak r/w bandwidth

01:20PM EDT - 48 lanes of PCIe 4.0

01:20PM EDT - Up to x16 CAPI 2.0

01:20PM EDT - Up to x48 NVLINK attach

01:20PM EDT - Shows 2S replacement, but can scale to 16 socket

01:21PM EDT - OpenCAPI 4.0

01:21PM EDT - support for 64/128/256B cache lines

01:21PM EDT - supports 128B messages for low latency

01:22PM EDT - Supports virtual address cache for system memory

01:22PM EDT - Host manages the higher level cache coherency

01:23PM EDT - P9 SU supports 4xDDR4, P9 SO supports 4x Centaur, P9 AIO supports 8x OMI

01:23PM EDT - On each side

01:24PM EDT - OMI DDIMM looks very different

01:24PM EDT - Will see if I can get a better photo

01:25PM EDT - Microchip SMC1000 chip used on the OMI DDIMM

01:25PM EDT - effective bandwidth and latency equivalent to LRDIMM

01:26PM EDT - Q: energy per bit on memory vs DDR?

01:27PM EDT - A: Don't have numbers here. We shifted power from the DDR PHY onto the memory DIMM which helps with cooling conditions. The 8 lane memory device can move to 2 lane or 4 lane depending on use. It does dynamically shift based on utilization. Better than DDR anywya

01:28PM EDT - Q: Does the OMI DDIMM have a cache? A: No, it's a slimmer device with write buffering no caching

01:29PM EDT - Q: Is OMI like CXL? A: Viewing CXL is focused more on accelerators. OMI is available today and ahead of the competition and been in development a long time. I'd be surprised if other buffered memory solutions get as low latency as us. I'd be surprised if CXL has such a low latency to memory

01:30PM EDT - That's it for this talk. Small break now, next talk for live blogging is MLperf

Comments Locked

17 Comments

View All Comments

  • peevee - Monday, August 19, 2019 - link

    And people care less and less. Price/perf of POWER + AIX compared to EPYC and Linux is simply nowhere close. Only corruption (kickbacks) keeps it alive.
  • extide - Monday, August 19, 2019 - link

    At least they are still innovating. With basically no competition in that space besides their own older products -- they still come out with new stuff that's fun to read about -- unlike Intel....
  • HStewart - Monday, August 19, 2019 - link

    I disagree with this statement, in some ways IBM is like Intel, similar to Intel being stuck with old compatibly which make it struggle to advance the technology, but Intel does what it can by mobilizing the cpu and introducing new technology like AVX and AVX 512.

    But going multiple cores, is just a band aid in my opinion. what needs to happen is advance the core from within. But likely similar to IBM, doing so introduces software compatibility issues and it hard to advance software from tons of software.

    If you remember when 64 bit came out, Intel want to people to I64 even though extending x86 is a much simple technique and could have be done Intel wanted to break software compatible with x86 but people did not buy into it.

    But IBM has been known to mess up the industry, I had a personal professional experience, back about 27 years ago, I work as lead developer on PC-MOS/386 and I was working Windows /386 compatibility and I wanted work on DPMI support but Windows would not allow it. I did a message on Microsoft Composerv forum and got the following response "Not because of your company but another company, only people that can respond to your question is Bill Gates or Steve Balmer" This of course is IBM with OS/2 - Windows did not allow it's self to be DPMI client.

    This is not the first time IBM has messed up, they required second source CPU compatibility for Microsoft in original IBM PC - which started for good or bad Clones pcs and eventually IBM try to change with PS/2 line but eventually only thing came out is 3 1/2 in discs which are no longer needed
  • levizx - Monday, August 19, 2019 - link

    Or maybe you can offer free software update to lure them to x86?
  • zdz - Monday, August 19, 2019 - link

    AIX? Meh. Think POWER + Linux.
  • Ninhalem - Monday, August 19, 2019 - link

    You're forgetting about the supercomputing world where IBM is still very much alive and kicking butt. Summit at the Oak Ridge National Laboratory is the fastest supercomputer in the world and uses POWER9 CPU's coupled with Nvidia Tesla GPU's.
  • HStewart - Monday, August 19, 2019 - link

    It kind of funny how all the companies are saying they have the world fastest super computer,

    Intel and AMD and now IBM.

    But things have change because of remember when x86 was just for home and IBM was mainframe
  • mode_13h - Monday, August 19, 2019 - link

    They need a lot more volume than that, to stay viable.
  • Threska - Monday, August 19, 2019 - link

    Wonder if these non-caring "people" are already vested in x86?
  • Phynaz - Monday, August 19, 2019 - link

    These aren’t PCs. Lots of people care, including super computing centers.

Log in

Don't have an account? Sign up now