Hot Chips 2018: AMD APU Optimization Live Blog (Noon PT, 7pm UTC)

by Ian Cutress on August 20, 2018 2:55 PM EST

9 Comments | Add A Comment

9 Comments

03:01PM EDT - AMD is also at Hot Chips, speaking about Raven Ridge and its APUs. The key elements to this talk will be the optimizations made for Raven Ridge, specifically around power and data management. We'll be live blogging the talk for everyone to follow.

03:02PM EDT - Working on APUs for several years

03:02PM EDT - Raven Ridge allowed the users to experience an uplift in performance

03:03PM EDT - Focusing on power and power efficiency

03:03PM EDT - GPUs have an appetite for memory - getting efficiency is important, as is power

03:03PM EDT - Battery life is a key consideration too

03:03PM EDT - 4C/8T, 11 Vega CUs

03:04PM EDT - ALmost everything is new - new Upgraded display engine, new audio coprocessor, new IO subsystem, USB 3.1 and USB-C native

03:05PM EDT - Went big on the GPU

03:05PM EDT - Up 60% transistors from Bristol

03:05PM EDT - But 16% smaller based on process note

03:05PM EDT - BGA package

03:06PM EDT - Also in desktop

03:06PM EDT - Dedicated L2 cache in the GPU

03:06PM EDT - Flexible geometry engine

03:06PM EDT - 16 pixel units

03:06PM EDT - two render back-ends

03:06PM EDT - 1200 Tri/sec at 1200 MHz

03:08PM EDT - Smaller L3 cache than desktop

03:08PM EDT - Precision Boost 2 for dynamic core frequency

03:09PM EDT - GPU workloads tends to go through phases of render and physics phases

03:09PM EDT - Can adjust power based on where it is needed

03:09PM EDT - Fine grained p-states

03:09PM EDT - On die power regulation

03:09PM EDT - Can exploit all the power

03:10PM EDT - Heart of the chip is the fabric

03:10PM EDT - oen coherent protocol - Infinity Fabric

03:10PM EDT - Manages the full SoC

03:10PM EDT - Overall power budget and management

03:10PM EDT - Enhanced flow for powering on and off components

03:11PM EDT - Knew in 2013/2014 AMD was going to major reset the CPU portfolio

03:11PM EDT - IF was designed to scale from Server to high-end graphics into smaller mobile SoCs and desktop

03:11PM EDT - IF is a Transport Layer

03:12PM EDT - Scalable Data Ports and SDP interface modules to the transport layer

03:12PM EDT - SDP hides complexities of coherence protocol from connected IP

03:12PM EDT - CPU, GPU, Memory, IO, all use an SDPIM into the transport layer

03:13PM EDT - Transport Layer Switches are crossbars

03:13PM EDT - Coherent ports in region A/B

03:13PM EDT - Structured for multi-region power gating

03:13PM EDT - Up to 5 transfers per clock per switch

03:14PM EDT - Turn off bits of the fabric that are not needed to save power

03:14PM EDT - Transport request queues at each entry point into the fabric

03:14PM EDT - Hard Real Time, Soft Real Time, and Non-Real Time

03:15PM EDT - Within this, multiple virtual queues and channels

03:15PM EDT - Also priority classes end-to-end

03:15PM EDT - Can escalate through the entire fabric

03:15PM EDT - Need to improve mem bandwidth for GPU

03:15PM EDT - Bigger CPU and GPU caches helped a lot

03:16PM EDT - Caching algorithms and lossless compression (DCC) helps

03:16PM EDT - Direct reads of compressed memory

03:16PM EDT - Shadow of Mordor seems more memory active

03:17PM EDT - Direct compare Bristol Ridge and Ravin Ridge

03:17PM EDT - Draw Stream Binning Radterizer improves bandwidth

03:18PM EDT - Increase display engines - reset the engine and set a target to do a 4K display at Vmin (lowest voltage of process)

03:18PM EDT - Four pipes and flexibility to combine pipes or act independent

03:18PM EDT - 4x throuput over Bristol

03:18PM EDT - Keeping up with codec improvements

03:19PM EDT - Rather than 3 rails (CPU, GPU, IO), now have one

03:19PM EDT - 3 rail methods meant overprovisioning required - very inefficient. Now can be very efficient

03:19PM EDT - In-chip, rather than do FIVR, do individual LDO in each CPU core and graphics

03:19PM EDT - Allows more fine grained power delivery

03:20PM EDT - Allows for better control

03:20PM EDT - From Bristol Ridge, higher boost current but smaller regulator, overall lower ICCMax for CPU+GPU

03:21PM EDT - Each core has a regulator, so can control power states and power down each core when idle

03:21PM EDT - Fast graphics off and power down when needed

03:21PM EDT - New power modes

03:22PM EDT - managing a lot of skin temps through STAPM

03:22PM EDT - Skin Temperature Aware Power management

03:22PM EDT - Previous gen didn't exploit thermal headroom

03:22PM EDT - Now can go into thermal budgets and maximise performance

03:23PM EDT - Performance goes up

03:23PM EDT - Now 25x20 goal

03:24PM EDT - Above progress goal

03:24PM EDT - New systems will come out in the future to meet the target

03:25PM EDT - Q&A Time

03:27PM EDT - Q: Talk coherency and master/slave. A: The masters are participating in the protocol, slaves are the memory controllers. Manage all coherent / non coherent traffic

03:28PM EDT - Q: Are the display controllers coherent? A: The frame buffer is considered non-coherent memory

03:29PM EDT - Q: Moore's Law is painful but you can stay on course for 25x20 ? A: We are seeing improvements, looking forward to 7nm. Looking beyond that. Challenges today are more like IO scaling. If we start looking at IO bottlenecks, such as mem and mem throughput, that's our bigger challenges. Also, thermal management, thermal density, trying to tune every last bit in the device.

03:29PM EDT - That's a wrap. Next Live Blog at 4pm PT.

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

9 Comments

View All Comments

jjj - Monday, August 20, 2018 - link
Covering Xilinx, Mythic and ARM tomorrow?
Ian Cutress - Monday, August 20, 2018 - link
https://twitter.com/IanCutress/status/103171584120...

Mythic isn't on my list.
jjj - Tuesday, August 21, 2018 - link
Maybe take a closer look at what Mythic is trying to do and reconsider?
SquarePeg - Monday, August 20, 2018 - link
I wish that AMD would give their APU's a quad channel memory controller for better bandwidth. A quad core APU with 24 CU's and 16gb (4x4gb) of ram @ 3466mhz would make for a solid little machine.
PeachNCream - Tuesday, August 21, 2018 - link
A small amount of optional dedicated GDDR5 like was done with TurboCache and Sideport a number of years about might also be a cost-effective option. iGPUs more memory bandwidth and lower latency than system memory can provide alone and it might not be acceptable to add in two more memory channels.
Targon - Tuesday, August 21, 2018 - link
Due to the AM4 socket on the desktop, AMD can't add more memory channels without changing the socket. If AMD adopts Gen-Z in the next few years, that has the potential to change things a lot in terms of how memory is addressed, because it can do away with the need for more memory channels to get higher bandwidth access to memory. Memory would be on the Gen-Z bus, and the CPU would just talk to the bus. This might require the new socket, or Ryzen fourth generation might be able to support it on AM4(if the motherboard is Gen-Z, then all pins for PCI Express slots and RAM could be used for the Gen-Z connection, if the motherboard is "legacy", then the CPU would use legacy mode and be limited to the 2 channels. How long it will take for Gen-Z to show up is the question.
Blitzed Penguin - Tuesday, August 21, 2018 - link
Did AMD say if they are going to release optimized drivers for thier mobile Raven Ridge APU's? Current drivers for Ryzen 5 2500u are still stuck on 17.7 and are released from OEMs not AMD. Power management between the CPU and GPU is all wrong.
denywinarto - Wednesday, August 22, 2018 - link
Is this desktop or laptop apu? Kinda to soon for an upgrade isnt it? 2019 would make more sense..
abufrejoval - Friday, August 24, 2018 - link
As good as this is, what I really expected AMD to produce is what they gave away to Intel or did with the Chinese: Produce an APU which doesn't compromise on the graphics, even if that precludes the 15Watt mobile space (BTW: Is there any initiative to produce mobile GDDRx or HBM?).

So put some HBMx on the die carrier and then use DDRx only for the CPU (would even work with an AM4 socket, right?) or go all GDDR5 like they did with the console and perhaps allow to combine several of these consoles into something beefier (2x, 4x) connected via IF.

Hot Chips 2018: AMD APU Optimization Live Blog (Noon PT, 7pm UTC)

Post Your Comment

9 Comments

View All Comments

jjj - Monday, August 20, 2018 - link

Ian Cutress - Monday, August 20, 2018 - link

jjj - Tuesday, August 21, 2018 - link

SquarePeg - Monday, August 20, 2018 - link

PeachNCream - Tuesday, August 21, 2018 - link

Targon - Tuesday, August 21, 2018 - link

Blitzed Penguin - Tuesday, August 21, 2018 - link

denywinarto - Wednesday, August 22, 2018 - link

abufrejoval - Friday, August 24, 2018 - link

Log in

Don't have an account? Sign up now