Performance - An Update

The Chipworks PS4 teardown last week told us a lot about what’s happened between the Xbox One and PlayStation 4 in terms of hardware. It turns out that Microsoft’s silicon budget was actually a little more than Sony’s, at least for the main APU. The Xbox One APU is a 363mm^2 die, compared to 348mm^2 for the PS4’s APU. Both use a similar 8-core Jaguar CPU (2 x quad-core islands), but they feature different implementations of AMD’s Graphics Core Next GPUs. Microsoft elected to implement 12 compute units, two geometry engines and 16 ROPs, while Sony went for 18 CUs, two geometry engines and 32 ROPs. How did Sony manage to fit in more compute and ROP partitions into a smaller die area? By not including any eSRAM on-die.

While both APUs implement a 256-bit wide memory interface, Sony chose to use GDDR5 memory running at a 5.5GHz data rate. Microsoft stuck to more conventionally available DDR3 memory running at less than half the speed (2133MHz data rate). In order to make up for the bandwidth deficit, Microsoft included 32MB of eSRAM on its APU in order to alleviate some of the GPU bandwidth needs. The eSRAM is accessible in 8MB chunks, with a total of 204GB/s of bandwidth offered (102GB/s in each direction) to the memory. The eSRAM is designed for GPU access only, CPU access requires a copy to main memory.

Unlike Intel’s Crystalwell, the eSRAM isn’t a cache - instead it’s mapped to a specific address range in memory. And unlike the embedded DRAM in the Xbox 360, the eSRAM in the One can hold more than just a render target or Z-buffer. Virtually any type of GPU accessible surface/buffer type can now be stored in eSRAM (e.g. z-buffer, G-buffer, stencil buffers, shadow buffer, etc…). Developers could also choose to store things like important textures in this eSRAM as well, there’s nothing that states it needs to be one of these buffers just anything the developer finds important. It’s also possible for a single surface to be split between main memory and eSRAM.

Obviously sticking important buffers and other frequently used data here can definitely reduce demands on the memory interface, which should help Microsoft get by with only having ~68GB/s of system memory bandwidth. Microsoft has claimed publicly that actual bandwidth to the eSRAM is somewhere in the 140 - 150GB/s range, which is likely equal to the effective memory bandwidth (after overhead/efficiency losses) to the PS4’s GDDR5 memory interface. The difference being that you only get that bandwidth to your most frequently used data on the Xbox One. It’s still not clear to me what effective memory bandwidth looks like on the Xbox One, I suspect it’s still a bit lower than on the PS4, but after talking with Ryan Smith (AT’s Senior GPU Editor) I’m now wondering if memory bandwidth isn’t really the issue here.

Microsoft Xbox One vs. Sony PlayStation 4 Spec comparison
  Xbox 360 Xbox One PlayStation 4
CPU Cores/Threads 3/6 8/8 8/8
CPU Frequency 3.2GHz 1.75GHz 1.6GHz
CPU µArch IBM PowerPC AMD Jaguar AMD Jaguar
Shared L2 Cache 1MB 2 x 2MB 2 x 2MB
GPU Cores   768 1152
GCN Geometry Engines   2 2
GCN ROPs   16 32
GPU Frequency   853MHz 800MHz
Peak Shader Throughput 0.24 TFLOPS 1.31 TFLOPS 1.84 TFLOPS
Embedded Memory 10MB eDRAM 32MB eSRAM -
Embedded Memory Bandwidth 32GB/s 102GB/s bi-directional (204GB/s total) -
System Memory 512MB 1400MHz GDDR3 8GB 2133MHz DDR3 8GB 5500MHz GDDR5
System Memory Bus 128-bits 256-bits 256-bits
System Memory Bandwidth 22.4 GB/s 68.3 GB/s 176.0 GB/s
Manufacturing Process   28nm 28nm

In order to accommodate the eSRAM on die Microsoft not only had to move to a 12 CU GPU configuration, but it’s also only down to 16 ROPs (half of that of the PS4). The ROPs (render outputs/raster operations pipes) are responsible for final pixel output, and at the resolutions these consoles are targeting having 16 ROPs definitely puts the Xbox One as the odd man out in comparison to PC GPUs. Typically AMD’s GPU targeting 1080p come with 32 ROPs, which is where the PS4 is, but the Xbox One ships with half that. The difference in raw shader performance (12 CUs vs 18 CUs) can definitely creep up in games that run more complex lighting routines and other long shader programs on each pixel, but all of the more recent reports of resolution differences between Xbox One and PS4 games at launch are likely the result of being ROP bound on the One. This is probably why Microsoft claimed it saw a bigger increase in realized performance from increasing the GPU clock from 800MHz to 853MHz vs. adding two extra CUs. The ROPs operate at GPU clock, so an increase in GPU clock in a ROP bound scenario would increase performance more than adding more compute hardware.

The PS4's APU - Courtesy Chipworks

Microsoft’s admission that the Xbox One dev kits have 14 CUs does make me wonder what the Xbox One die looks like. Chipworks found that the PS4’s APU actually features 20 CUs, despite only exposing 18 to game developers. I suspect those last two are there for defect mitigation/to increase effective yields in the case of bad CUs, I wonder if the same isn’t true for the Xbox One.

At the end of the day Microsoft appears to have ended up with its GPU configuration not for silicon cost reasons, but for platform power/cost and component availability reasons. Sourcing DDR3 is much easier than sourcing high density GDDR5. Sony managed to obviously launch with a ton of GDDR5 just fine, but I can definitely understand why Microsoft would be hesitant to go down that route in the planning stages of Xbox One. To put some numbers in perspective, Sony has shipped 1 million PS4s thus far. That's 16 million GDDR5 chips, or 7.6 Petabytes of RAM. Had both Sony and Microsot tried to do this, I do wonder if GDDR5 supply would've become a problem. That's a ton of RAM in a very short period of time. The only other major consumer of GDDR5 are video cards, and the number of cards sold in the last couple of months that would ever use that RAM is a narrow list. 

Microsoft will obviously have an easier time scaling its platform down over the years (eSRAM should shrink nicely at smaller geometry processes), but that’s not a concern to the end user unless Microsoft chooses to aggressively pass along cost savings.

Introduction, Hardware, Controller & OS Image Quality - Xbox 360 vs. Xbox One
Comments Locked

286 Comments

View All Comments

  • elerick - Wednesday, November 20, 2013 - link

    Thanks for the power consumption measurements. Could the xbox one standby power be higher due to power on / off voice commands?
  • althaz - Wednesday, November 20, 2013 - link

    I suspect this is almost certainly the case. I wonder if it drops down below 10w if you turn off the Kinect (which I would never do myself)?

    I also hope Sony update their software - the Xbox stays in the 16-18w range when downloading updates, whereas the PS4 jumps up to 70 (70w when in standby and downloading an update, but still takes 30 seconds to start up!).
  • mutantsushi - Saturday, November 23, 2013 - link

    It seems that the PS4's extremely high standby/download power draw is due to the ARM co-processor not being up to the task, it was supposed to be able to handle basic I/O tasks and other needed features, but apparently it wasn't quite spec'd sufficiently for the task, forcing Sony to keep the main CPU powered on to handle that task. The rumor is that they will "soon" release a new revision with a more powerful ARM core that is up to the task, and which should allow powering down the x86 CPU completely, as per the original plan. (either that, or managing to rework the "standby" software functions so that the existing ARM core can handle it would also do the trick)

    I believe MS is now also rumored to "soon" release a revision of the Xbone, although what that might entail is unknown. An SKU without the Kinect could allow them to drop the price $100 to better compete with PS4.

    Incidentally, the Xbone seems to be running HOTTER than the PS4, so MS' design certainly cannot be said to be a more efficient cooling design, more like they have more open space which isn't being efficiently used compared to PS4's design. The temp differential is also probably down to MS' last minute decision to give a 5% clockspeed bump to the APU.

    I'm looking forward to the 'in depth' article covering each. As far as performance is applicable in actual use scenarios, i.e. games, I'm interested to get the real low-down... The vast similarity in most aspects really narrows the number of factors to consider, so the actual differentiating factors really should be able to comprehensively addressed in their implications.

    Like Anand says, I don't think memory thru-put is a gross differentiator per se, or at least we could say that Xbone's ESRAM can be equivalent under certain plausible scenarios, even if it is less flexible than GDDR and thus restricts possible development paths. For cross-platform titles at least, that isn't really a factor IMHO.

    The ROP difference is probably the major factor for any delta in frame buffer resolution, but PS4's +50% compute unit advantage still remains as a factor in future exploitability... And if one wants to look at future exploitability then addressing GPU and PS4's features applicable to that is certainly necessary. I have seen discussion of GPGPU approaches which essentially can more efficiently achieve 'traditional graphics' tasks than a conventional pure GPU approach, so this is directly applicable to 'pure graphics' itself, as well as the other applications of GPGPU - game logic/controls like collisions, physics, audio raycasting, etc.

    When assessing both platforms' implications for future developments, I just don't see anything on Xbone's side that presents much advantage re: unique architectural advantages that isn't portable to PS4 without serious degradation, while the reverse does very much present that situation. While crossplatform games of course will not truly leverage architectural advantages which allow for 'game changing' differences, PS4's CU, ROP, and GPGPU queue advantages should pretty consistently allow for 'turning up the quality dial' on any cross-platform title... And to the extent that their exploitation techniques becomes widely used, we could in fact see some 'standardizd' design approachs which exploit e.g. the GPGPU techniques in ways easily 'bolted on' to a generic cross platform design... Again that's not going to change the ultimate game experience, but it is another vector to increase the qualitative experience. Certainly in even the first release games there is differences in occlusion techniques, and this is almost certainly without significant exploitation of GPGPU.
  • mutantsushi - Saturday, November 23, 2013 - link

    If Xbone's resolution is considered satisfactory, I do wonder what PS4 can achieve at the same resolution but utilizing the extra CU and GPGPU capacity to achieve actually unique difference, not just a similar experience at higher resolution (i.e. 1080 vs. 900). If 900 upscaled is considered fine, what can be done if that extra horsepower is allocated elsewhere instead of increasing the pixel count?
  • errorr - Wednesday, November 20, 2013 - link

    That is still ridiculous considering the moto x and some other future phones can do the same thing at order of magnitudes less power draw.
  • Teknobug - Wednesday, November 20, 2013 - link

    I love my Moto X, X8 8-core processor means each core has its own job, 1 core for always listening and 1 core for active notifications. Very easy on the battery, which is why it is one of the best battery life phones right now.
  • uditrana - Thursday, November 21, 2013 - link

    Do you even know what you are talking about?
  • blzd - Thursday, November 21, 2013 - link

    Moto X is dual core. x8 is a co processor not eight cores.
  • errzone - Monday, November 25, 2013 - link

    That's not entirely true either. The Moto X uses the Qualcomm MSM8960. The SOC is a dual core processor with an Adreno 320 GPU, which has 4 cores. Adding the 2 co-processors equals 8; hence Motorola marketing speak of X8.
  • Roy2001 - Wednesday, December 11, 2013 - link

    Kinect?

Log in

Don't have an account? Sign up now