Ask the Experts: ARM's Cortex A53 Lead Architect, Peter Greenhalgh

Name: Ask the Experts: ARM's Cortex A53 Lead Architect, Peter Greenhalgh
Item: Ask the Experts: ARM's Cortex A53 Lead Architect, Peter Greenhalgh
Author: Anand Lal Shimpi

by Anand Lal Shimpi on December 10, 2013 9:00 AM EST

Posted in
Ask the Experts

158 Comments | Add A Comment

158 Comments

Given the timing of yesterday's Cortex A53 based Snapdragon 410 announcement, our latest Ask the Experts installment couldn't be better. Peter Greenhalgh, lead architect of the Cortex A53, has agreed to spend some time with us and answer any burning questions you might have on your mind about ARM, directly.

Peter has worked in ARM's processor division for 13 years and worked on the Cortex R4, Cortex A8 and Cortex A5 (as well as the ARM1176JZF-S and ARM1136JF-S). He was lead architect of the Cortex A7 and ARM's big.LITTLE technology as well.

Later this month I'll be doing a live discussion with Peter via Google Hangouts, but you guys get first crack at him. If you have any questions about Cortex A7, Cortex A53, big.LITTLE or pretty much anything else ARM related fire away in the comments below. Peter will be answering your questions personally in the next week.

Please help make Peter feel at home here on AnandTech by impressing him with your questions. Do a good job here and I might be able to even convince him to give away some ARM powered goodies...

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

158 Comments

View All Comments

Peter Greenhalgh - Wednesday, December 11, 2013 - link
Hi pigfetta,

ARM has an active architecture research team and, as I'm sure you would expect, look at all new architectural developments.

It would be possible to design a CPU with on-chip FPGA (after all, most things in design are possible), but the key to a processor architecture is code compatibility so that any application can run on any device. If a specific instruction can only run on one device it is unlikely to be taken advantage of by software since the code is no longer portable. If you look at the history of the ARM architecture it's constantly evolved with new instructions added to support changes in software models. These instructions are only introduced after consultation with the ARM silicon and software partners.

You may also be interested in recent announcements concerning Cortex-A53 implemented on an FPGA. This allows standard software to run on the processor, but provides flexibility around the other blocks in the system.
mpjesse - Wednesday, December 11, 2013 - link
I'm pretty sure no one asked you and that the question was meant to be answered by the ARM engineer, should he choose to answer it. Instead of trolling perhaps you should come up with your own question for our guest.

If you don't have anything nice to say, don't say it at all.
mpjesse - Wednesday, December 11, 2013 - link
I'm pretty sure no one asked you and that the question was meant to be answered by the ARM engineer, should he choose to answer it. Instead of trolling perhaps you should come up with your own question for our guest.

If you don't have anything nice to say, don't say it at all.
kenyee - Wednesday, December 11, 2013 - link
How low a speed can the ARM chips be underclocked?
i.e., what limits the lowest speed?
Peter Greenhalgh - Wednesday, December 11, 2013 - link
Hi Kenyee,

If you wished to clock an ARM processor at a few KHz you could. Going slower is always possible!
BoyBawang - Wednesday, December 11, 2013 - link
Hi Peter,

ARM, as one of the key founders of HSA foundation organized by AMD, What is the now the current state progress on the ARM implementation?
Hulk - Wednesday, December 11, 2013 - link
Is this move to 64 bit driven by a need from the hardware and/or software or or pressure from competitors? If the former can you indicate some of the improvements users will see and feel with 64 bit?
AgreedSA - Wednesday, December 11, 2013 - link
Why aren't you helping to make Terminator 2 (specifically, not the first one, that one's robot was just scary while Terminator 2 had a friendly robot and a scary robot as well) a reality in our world? Do you have something against robots? Seems vaguely speciestist to be honest...
Alpha21264 - Wednesday, December 11, 2013 - link
Can you talk a bit about your personal philosophy regarding pipeline lengths. As the A53 and A57 diverge significantly on the subject. Too short its difficult to implement goodness like a scheduler but as you increase the length you also contribute to design bloat: you need large branch target arrays with both global and local history to avoid stalls, more complicated redirects in the decoder and execution units to avoid bubbles, and generally just more difficult loops to converge in your design. Are you please with the pipeline in the A53, where do you see happening with the pipeline both in the big cores and the little ones going forward (anticipate a vague answer on this one, but not going to stop me from asking)?
Peter Greenhalgh - Thursday, December 12, 2013 - link
Hi Alpha,

I'd expect my view of pipeline lengths to be similar to most other micro-architects. The design team have to balance the shortest possible pipeline to minimise branch mis-prediction penalty and wasted pipelining of control/data against the gates-per-cycle needed to hit the frequency target. Balance being the operative word as the aim is to have a similar amount of timing pressure on each pipeline stage since there's no point in having stages which are near empty (unless necessary due to driving long wires across the floorplan) and others which are full to bursting.

Typically a pipeline is built around certain structures taking a specific amount of time. For example you don't want an ALU to be pipelined across two cycles due to the IPC impact. Another example would be the instruction scheduler where you want the pick->update path to have a single-cycle turnaround. And L1 data cache access latency is important, particularly in pointer chasing code, so getting a good trade-off against frequency & the micro-architecture is required (a 4-cycle latency may be tolerable on a wide OOO micro-architecture which can scavenge IPC from across a large window, but an in-order pipeline wants 1-cycle/2-cycle).

We're pretty happy with the 8-stage (integer) Cortex-A53 pipeline and it has served us well across the Cortex-A53, Cortex-A7 and Cortex-A5 family. So far it's scaled nicely from 65nm to 16nm and frequencies approaching 2GHz so there's no reason to think this won't hold true in the future.

Ask the Experts: ARM's Cortex A53 Lead Architect, Peter Greenhalgh

Post Your Comment

158 Comments

View All Comments

Peter Greenhalgh - Wednesday, December 11, 2013 - link

mpjesse - Wednesday, December 11, 2013 - link

mpjesse - Wednesday, December 11, 2013 - link

kenyee - Wednesday, December 11, 2013 - link

Peter Greenhalgh - Wednesday, December 11, 2013 - link

BoyBawang - Wednesday, December 11, 2013 - link

Hulk - Wednesday, December 11, 2013 - link

AgreedSA - Wednesday, December 11, 2013 - link

Alpha21264 - Wednesday, December 11, 2013 - link

Peter Greenhalgh - Thursday, December 12, 2013 - link

Log in

Don't have an account? Sign up now