Seems like Android has Windows' number as far as "multi-threading" is concerned, kudos to Google for this & seems like the tired old argument of developers getting a free pass (for poor MT implementation on desktops) needs to change asap!
Ehh, I think you're ignoring some key differences in clock speed and single threaded performance, not to mention how easily Intel can ramp clock speed up and back down, and then there's Hyper Threading which allows you to span more threads per core.
Laptops might be the outlier, but I dunno what benefit a desktop (which have commonly run quads for years) would see from a lower powered core cluster. Development just works very differently by nature of the environment.
Also things that benefit a ton from parallelization on the desktop often end up using the GPU instead... And/or specialized instructions that aren't available at all on mobile. It's not even apples and oranges IMO, it's apples and watermelons.
You're missing the point, which is that Google & Android have shown (even with the vast number of SoC's it runs on) that MT & load management, when implemented properly, on the supported hardware & complementing software, makes great use of x number of cores even in a highly constrained environment like a smartphone.
On desktops we ought to have had affordable octa cores available for the masses by now, but since Intel has no real competition & they price their products through the roof, we're seeing what or how windows & the x86 platform has stagnated. Granted that more people are moving to small, portable computing devices but there's no reason why the OS & the platform as a whole has to slow down, also the clock speed, IPC argument is getting old now. If anything DX12, Mantle, Vulkan et al have shown us is that if there's good hardware & the willingness to push it to its limits developers, with the right tools at hand, will make use of it. Not to mention giving them a free pass for badly coded programs, remember the "ST performance is king" argument, is the wrong way to go as it not only wastes the (great) potential of desktops but it also slows down the progress of PC as a platform.
Now I know MT isn't a cakewalk especially on modern systems but if anything it should be more widespread because desktops & notebooks give a lot of thermal headroom, as compared to tablets & smartphones, besides the 30+ years of history behind this particular industry should make the task easier. Also not all compute tasks can be offloaded to GPU, that's why it's even more imperative that the users push developers to make use of more cores & not get the free ride that GPGPU has been giving them over the last few years, as it is the GPU industry is also slowing down massively & then we'll eventually be back to square one & zero growth.
Yes and no. Google and Android are able to show that things like app updates, web page loads and general system upkeep is able to take advantage of multiple threads. But that's been true for a while. In a smartphone, those happen to be the performance dominating tasks. On a desktop, those tasks are noise.
Desktop workloads that actually stress the CPU (and users care about performing well) are very different. That's not to say they're not threadable, but they may not be as threadable as Chrome, which basically eats RAM and processes.
That being said, heterogenous MT could make a lot of sense for laptop processors as well. Having threadable workloads run on smaller Atoms instead of big Sky Lakes would probably improve efficiency. But it may not be as dramatic depending on the perf/W of Sky Lake at lower frequencies.
OK can we talk about this for a bit. I for one found the webpage CPU usage extremely disturbing. I'm running an old phone, Galaxy Nexus, and browsing has become by far the task my phone struggles with the most. Why is that? What is it about modern websites that causes them to be so CPU heavy? Is that acceptable? It does seem that much of the internet is filled with websites running shady scripts in the background and automatically playing video or sound which is annoying at the very least, but detrimental to performance always. Whatever happened to website optimization for minimizing data usage and actually making websites accessible?
Secondly, what is the actual throughput of CPUs in desktops compared to the latest state of the line arm APUs? Just because desktop workloads might be different, does that mean that a mobile APU cannot handle it or is that simply due to the usage mode of the device in question? What I'm seeing out of mobile/phone chips is that they are extremely capable, to the point I'm starting to wonder if I'll ever need another desktop rig to replace my old Phenom X2 machine.
I would guess that websites are just more complicated nowadays. Think about a dynamic website like Twitter, which has to have live menus and notifications/updates. That's basically a program more than just a web page. We've slowly migrated what used to be stand-alone programs to load-on-demand web programs. And added many many inefficient layers of script interpreters in between.
Somewhat ironically, the more modern a web-page the *less* friendly it is likely to be to multithreading. After all, modern features tend to include heavy javascript usage (which is almost purely single-threaded), and a CPU usage that is bottlenecked by a path through javascript (typically layout not actually the JS itself, but that layout affects JS and hence needs fine-grained interaction).
It is the more extensive use of client-side processing, in a nutshell, JavaScript and JSON. On older websites, they dynamic stuff was processed server-side and the client simply did page reloads. The modern sites require less bandwidth, but at the expense of increasing CPU usage.
Also, modern sites are higher res and more image intensive, or in other words more GPU heavy as well. Some of the Nexus struggle can be attributed to GPU load.
Most of it has to do with using multiple JavaScript libraries. It's not strange to need to download over 50 different files on a website today. Anandtech.com took 123 requests over four seconds to load. Mostly fonts, ads, and Twitter stuff, but it adds up.
You are totally misinterpreting these results. The mere existence of a large number of runnable threads does not mean that the cores are being usefully used. Knowing that there are frequently four runnable threads is NOT the same thing as knowing that four cores are useful, because it is quite possible that those threads are low priority, and that sliding them so as run consecutively rather than simultaneously would have no effect on perceived user performance.
There is plenty of evidence to suggest that this interpretation is correct. Within the AnandTech data, the fact that these threads are usually on the LITTLE cores, and running those at low frequency, suggests they are not high priority threads.
Now there is a whole lot of tribalism going on in this thread. I'm not interested in that; I'm interested in the facts. What the MS paper states (confirmed, IMHO) by these AnandTech results, is that there is a reasonable (around 20%) throughput improvement in going from one to two threads, along with a small (around 10%) energy drop, and that going from two to three or four cores buys you only very slight further energy and performance boosts. In one sense this means there's no harm in having octacores around --- they don't seem to burning energy, and in principle they could deliver extra snappiness (though the lousiness of the scheduling in these AnandTech results suggests that's more a hope than a reality). But there's a world of difference between the claim "doesn't hurt energy, may occasionally be slightly useful" and the claim "pretty much always useful because apps are so deeply threaded these days".
"we're seeing what or how windows & the x86 platform has stagnated"
Your argument is highly inaccurate and extremely dated. This isn't Windows XP era anymore... Windows 10 and 10 Mobile might as well be better than Android in what you're giving kudos to Google for (which they've somewhat managed after YEARS of promises). There's still a huge chunk of overhead in Android's rendering pipeline that needs serious attention. Android has made huge improvements, yes, but there still lots of work that needs to be done.
@Impulses has a good point too; It's extremely difficult to get a fair apples-to-apples comparison when it comes to optimal handling of workloads for varying thermal limits. CPUs at ~2W TDP behave VERY differently from those at 15W, and both behave yet differently from those running at 37W+. This becomes evident when middle ground ~5W mobile CPUs are in the picture, like Intel's Core M, where devices running those are showing no better battery life than their 15W counterparts running the same OS. (Windows 10 is changing that, however, and is showing extreme battery savings in these lower TDPs, more so than the improvements in higher TDP parts, which tells a lot about W10).
If that isn't clear enough already, read the article again. The author CLEARLY mentions in the first page not to make the mistake of applying the aforementioned metrics to other platforms and operating systems, and to strictly stick with Android and big.LITTLE.
Thank you lilmoe and name99! I read his comment and I was like really? These results don't support his claims and were never intended to compare platforms - as specifically stated by the author.
XP to win10 took what a decade & a half? Vista was the last major change, after XP, DX10 & UAC, with win7 then win8 & now win10 bringing only incremental updates. Yeah I call that slow & we've had quad cores since what nearly a decade now, even then a vast majority of systems (desktops+notebooks) are dual core or 2 cores+HT surely that must make you cringe! Then we have programs that don't make use of multiple cores efficiently &/or the latest instruction sets like AVX. There's just a single web browser, that I know of, which uses the latter on PC! Call it whatever you may or twist it however you like to but this is one of the major reasons that PC sales are declining not just "everyone owns one & so they don't need it" excuse that's thrown around far too often. So far as "extrapolating this article to my observations" argument is concerned, there's no need to do that since there's historical precedence & copious amount of evidence to support pretty much every word of what I've said.
Ugh dude, you have no idea what you are talking about. 4.4 architectures on a phone are a desperate attempt to reduce power usage. I am a programmer and compile times matter to me and threading helps. Even so going from 8 threads on my desktop CPU to 12 threads on the E CPU a year later only reduces a total recompile of 26 minutes by 2-3 minutes. But that E cannot clock as high, so in the regular incremental compile case it is slower. Do you get this? You are factually wrong for an actual core dependent use case.
Now I can stick my head in the sand like you and pretend that more cores are automatically better but it just isn't for my workload. You may as well bitch that I should be running on multi thousand dollar server CPUs with 16 cores. Again no. They have their place in a server, but no place in my desktop.
If "Google and Android" have 'nailed' MT then why do $600+ Android phones feel more sluggish, have a choppier UI, and launch programs slower than a 3 year old iPhone 5 or Lumia 800?
Perhaps because the kernel and underlying architecture are so bloated because they need to support so many SOC's. They've resorted to heavy compression just to keep distribution sizes down, which also hits performance.
Android only has one place, on cheap phones. You're an idiot if you buy a $600+ Android phone when you get the same crappy experience on a $50 Kyocera.
I've tried so hard to like Android over the years, but every device I've had completely disappointed me compared to older Blackberry and modern iPhone devices where you don't need to find hacked distributions when manufactures drop the ball supporting the phone, or just make a crappy ROM in general. Even Nexus devices aren't immune to this and historically they haven't been very good phones, although admittedly, the only high-end Android phone worth buying is a Nexus, but now they cost so much it isn't justifiable.
Basically I recommend two phones to people. If they want a cheap phone, get a OnePlus One or some other sub-$300 Android device. If you're budget is higher, get an iPhone, or if you are adventurous, a WinMo device. At least the iPhone will receive support for 4-5 years and holds its value during that time.
I'm calling BS on most of your claims. Your experience with a Moto E (not saying it's a bad phone) will be vastly different from that of a Note 5, and those differences can start as obvious as how often you need to refresh your Chrome pages as you run out of RAM. What "600+" Android phone are you talking about that feels “more sluggish and slower” than a 3 year old iPhone? If you want people to take your claim seriously then at least provide some examples rather than this generic BS that anyone can easily come up with. The way Android is designed makes it kind of difficult to bring updates as surprising as you may found. Every time the OS updates, there are changes to the HAL (hardware abstraction layer) and those changes can be minor or significant. It is then up to the SoC provider to provide the proper drivers needed after the HAL change, and they certainly won’t provide it for free. At the same time, OEM also have to decide how much the new update will impede performance. For example my first gen Moto X got an update to 5.1.1 a few months ago and despite the new features, there are still performance hits in places. Even older devices probably will do better on Jelly Bean and KitKat anyways since Google Play services can be updated independent of OS version. Here’s some useful info on why Android is as fragmented as it is http://www.xda-developers.com/opinion-android-is-i... The biggest reason Apple updated all those 4S isn’t because how they loved their users, but rather to purposely slow down their devices to force them to upgrade. You can just ask the 4S users around you to see what iOS 8 really meant for them. I do agree however that people should try more $300-400 devices that are near flagship level with compromises that are more tolerable, and this $600+ smartphone price should really tone itself down a bit.
Yeah i have to call bullshit on his claims too. I mean i know its anecdotal, but my buddies and i have had literally dozens of android phones over the years, as well as various iphones. And none of us have seen any kind of performance difference between the two. Im thinking he just had a shit experience with one android phone and like most people just wrote it off at that point.
I have had a bad experience with an HTC Rezound, but every phone ive had before or after that has been fantastic. I absolutely LOVE my LG G3, its extremely responsive and fast, and i've never had issues with slowdowns on it. That being said i dont do any "gaming" (and i put gaming in quotes for a reason) on the phone, so i can't speak to that. But as far as browser, youtube, other apps, etc. It couldn't be more perfect.
If you say so. As an IT director you should know that 99% of the time there is a problem, its user related and not hardware related. One thing i will give apple is that they lock their products down so hard that its much harder for the user to F it up. Whereas on more open platforms like android or windows, the user has much more control and thus much more ability to F things up royally.
Whether thats a plus or a minus really just depends on what you're looking for. For people who want or need control over their hardware, its a plus, for people who just want something "to work" so to speak, its a minus.
They are both clearly android fans and haven't ever given anything else a chance. The fact they ignore Apple has consistently had superior single threaded performance in their SOC's years and this has translated to better UX just goes to show that Android targeting multithreaded performance is a solution looking for a problem. There are so many underlying issues to address first, specifically making efficient use the Linux scheduler and perhaps setting a compatibility list for hardware instead of saying just make anything and we'll find a way to run on it no matter how crappy it runs.
Apple had not consistently had better performance per core. That's fairly recent (since cyclone, iirc). There are myriad issues at play. In the end, the market is best served by an open option, like Android, and customers choosing what works best for them and letting the rest fade away.
"Apple had not consistently had better performance per core. That's fairly recent (since cyclone, iirc). " Since Swift. That's iPhone 5, 5S, 6 (2012, 2013, 2014) and likely to be 6S and 2015 at least. Even the late-stage pre-Apple cores were substantially above average (in part because of Apple's custom SoC). The 4S was above the competition at the time: http://www.anandtech.com/show/4971/apple-iphone-4s...
Most people would consider "consistent enough" for "long enough" to make the statement reasonable.
And it is not like Apple don't resort to moar-cores. When they run into walls, they also have no choice but to take whatever routes that are available. Listening to some of the zealous Apple fans, one would mistake that iPhones have been rocking on a single-core all these years.
They have moved to dual-cores on the phones, and 3-cores on tablets. Moar-cores on iDevices are only a matter of time. Those specialized ASICs with fancy names apple give ("Motion Processor" for one) are also a concession made by Apple that there are cases where big cores are not always the best route to take when efficiency matters.
"They are both clearly android fans and haven't ever given anything else a chance." uhh my first smart device ever is a 2nd gen iPod touch...
So just because I proved you wrong, I have to be an Android fanboy? You said you tried all these Android phones "every week" and have "shit experiences." Again, you didn't bring up any names or so. What phones have you even tried? Who's being a fanboy here and can only provide claims without backing them up with facts?
I don't understand why you are arguing about this superior ST performance when it's irrelevant to this article. What this article simply proves is that Android does make use of extra threads and you get a benefit in power efficiency due to running MT thread, nothing about performance. In fact in most scenarios shown in the test most of the little cores are even saturated which means the workload isn't heavy at all.
"Apple has consistently had superior single threaded performance in their SOC's years and this has translated to better UX" any evidence that leads to this conclusion? also like tuxRoller said Apple only have IPC advantages in recent years with Cyclone series.
"There are so many underlying issues to address first, specifically making efficient use the Linux scheduler and perhaps setting a compatibility list for hardware instead of saying just make anything and we'll find a way to run on it no matter how crappy it runs." Where did you get the concept of make anything and find a way to work? All OEMs and SoC manufacturers optimize for Android just like how they optimize for Windows in desktop. Like I said before, SoC manufacturers have to provide driver update every time there's a HAL change in Android. How well they can do to optimize is up to themselves but the fact is that they do have to make their hardware compatible for Android
Did i suddenly log onto the pcgamer forums? The instant someone expresses any level of dismay or concern for an apple product, or says they have good experiences with android phones, it automatically means they're a nutswinging fanboy?
You can argue whether Apple is intentional or not but the end result is that 4S users are getting more sluggish experiences with their 4S after updated to iOS 8
Linux isn't great about niceness There's a few ways to fix this. One is to use cgroups ,(which Android uses). This works pretty well but I'd still subject, ultimately, to the scheduler. The other way is to run the rt kernel. That obeys priorities nicely (heh), but would be a bear to wrestle into Android and you'd lose some power efficiency. Also the rendering framework of Android may have some issues.
Very interesting article, much more favourable to multi-core designs than I would have thought.
Each article page must have cost an insane amount of time. However, I still feel like some more information could have been useful. This article is geared towards real-world use cases, but I think it would be interesting to repeat this analysis on a few commonly-used benchmarking apps. I feel like this would be interesting to compare them to real-world uses and may help understanding the results.
Every single synthetic I have ever seen vastly exaggerates the benefit. I would be interested in an actual real world use case that actually matches a synthetic. It would blow my mind if there are any.
I'm not sure of the practicality, but I would love to see a follow-up with Denver k1 and the A8X to see how lower core count out of order and in order SoCs are handled.
Heck yes. And of course I'm interested if anything like this is even remotely possible for Apple hardware, though likely it would require jailbreaks, at least.
Add one more vote for the follow up with synthetics. I would also want to see how the multitasking compares with the Snapdragons as they use the different frequency and voltage planes per core instead of the big.LITTLE. But I guess that would be better to see with the SD 820, as the 810 uses big.LITTLE. Consider it a request for when it comes!
big.LITTLE can be use different planes for each cluster but same for all cores in each cluster, Qualcomm SoCs can use different planes for each core, that's the difference and it's a big one. https://www.qualcomm.com/news/onq/2013/10/25/power... I'm not sure that can be done in big.LITTLE.
to balance everything out-- meh, that doesn't interest me. most of the time I'm concerned with battery life and every-day performance. Android isn't a huge gaming device so absolute performance doesn't interest me.
Very interesting article. Seems like the mantra of "more cores on mobile are just marketing" was wrong in terms of Android, seems to dip into both four core big and little clusters pretty well. That puts the single thread performance having lagged behind the Apple A series (up until the S6 at least) in a new light, since it can in fact use the full multicore performance.
In gaming there is a big advantage. By using mostly the small cores you allow for more TDP to go to the GPU. One more relevant thing would console ports in the next couple of years when mobile GPUs will catch up with consoles. The current consoles have 8 small cores and that fits just right with many small cores in Android.
for one, it was my mantra. I liked having 4 cores because 2 wasn't enough, but according to my hotplugging times, I only really need 3 for optimal experience most of the time
In fact you are in the right place to ask that question, as one of the profets os the mantra was Anand Lal Shimpi himself: http://www.anandtech.com/show/7335/the-iphone-5s-r... Quoting from the article: "two faster cores are still better for most uses than four cores running at lower frequencies" You can read the rest if you are interested, but that´s how much of the mantra started.
I wont hold that against Anand, he was lobbying toward a job at Apple ;)
But seriously, it was 2 years ago. At that time ""two faster cores are still better for most uses than four cores running at lower frequencies" may well have been the case. Also, no matter how you slice it, an 8 core big.little is not a true 8 core CPU. It's really still 4 cores.
/edit. I do remember alot of people crying "you dont need 8 cores" but again, that was people misunderstanding ARM's big.little architecture made worse by marketing calling it "8" cores" in the first place.
I agree with you, and he may not have been THAT wrong at the time. But with the current implementations of power gating and turbos most of what he said has been rendered false. AFAIK, big.LITTLE can be a true 8 core, it actually depends on the implementation.
"Also, no matter how you slice it, an 8 core big.little is not a true 8 core CPU. It's really still 4 cores."
An 8 core big.LITTLE chip running in HMP mode (like the Exynos 5422 onward) is in fact a "true" 8 core chip in which all 8 cores can be running at the same time. You're thinking core migration and cluster migration setups in which only 4 cores (or a combination of 4) can be running at the simultaneously.
If the option is really four weak cores or two powerful cores, I think the two powerful ones would make a better system. If we could have two powerful cores AND four weak cores, that would be even better.
Just everyone who's easily influenced, really. I heard it from pretty much everyone. Someone I was talking to apparently "knew someone who designed a Galaxy phone." He claimed they wanted to design it with two cores, or something, but the marketers wanted eight.
You cannot make comparisons between different SoCs even if they have the same CPU IP and the same manufacturing process. The S808 is different from the S810 which are again different from Nvidia's X1 even if all 3 have A57 cores on TSMC 20nm.
The S808 and S810 should be fairly similar though. That's not to say you can say that the only difference is the CPU configuration but a similar study on what the behavior is like on a different SoC with fewer cores would be helpful.
Threading isn't 100% free and neither is thread migration. It might be good to take a look at just what the S810 is doing over time compared to the S808 in terms of CPU activity.
As an ex-Android developer I can remember that the SDK not only encourages, but sometimes straight out enforces extensive usage of threads. For example, around API level 14/15, making a network request in the main thread would throw an exception, which may seem obvious to experienced developers but wasn't enforced in earlier versions. This is a simple example, but having the API itself pushing towards multi-threaded coding has a positive effect on the way Android developers build their apps. I'm not sure then why Google's own browser would be surprising for its usage of high thread counts - even a very basic app would be very likely to spawn much more than 4 threads nowadays.
Just wanted to say that it's agreat article. Well done and very interesting: the use of 4+4 cores on a mobile platform while on a PC we still have plenty of 2 cores CPUs, seemed quite ridiculous. But no, clearly, it makes sense.
Very interesting article. These test were done on Android 5, I take it. I know that this analysis is geared toward current hardware, but most of the "4cores are only marketing" discussion was quite a while back when most device had some version of Android 4. I wonder if the benefits of more cores did show up then. The second thing i'm interested in is "How much of this is applicable to other SOCs". Not much I gather. And related to that "How much of this is limited to Samsung devices", because they made the CPU and the Firmware-softwarelayer of the tested device.
It is kind of a misleading analysis. One single haswell core could juggle all of these processes and still have plenty of time to sleep. So you're not really telling us anything here. Is a wider fatter core better than all these narrow underpowered cores? Given the performance and power consumption of the apple SoCs, I would still have to say yes.
This! When developing for iOS I usually have to span several threads (queues in Apple's world) for things that would otherwise block the main queue, which would cause the UI to "freeze" and the dual core SoC inside the devices I'm targeting are munching my threads absolutely fine. Just by saying that the several extre cores found in Android phones aren't sleeping you're not coming to any definitive conclusion about any clear advantage of having them.
The thing is that when you have 4 threads, 4 cores can potentially do the job more efficiently with performance equal to a single core with 4 times the execution speed.
Potentially, but not necessarily. Threading and thread migration aren't free. It depends on how much performance you really need. The A57(R3), for instance, at very low frequencies is actually slightly more power efficient than the A53 at its peak frequency (surprising, I know).
If you have 4 threads that need absolutely-bare-minimum performance that a min-frequency single-core could handle, waking up 4 cores (even if they're smaller) and loading the code/data into the caches of each of those cores isn't necessarily a clear win. Especially if they share the same code.
"The A57(R3), for instance, at very low frequencies is actually slightly more power efficient than the A53 at its peak frequency (surprising, I know)."
Cool story. Except that, in most of the smaller multithreaded workload cases, the little cores usually aren't near their saturation levels. Also, in most cases, when they _do_ get saturated, the workload is transferred and dealt with by big core or two in short bursts.
Even if it isn't a "clear win", in *some* workloads mind you, saying that there isn't any apparent merit in these configurations is really irresponsible.
I don't think I said there's no merit to such configurations. I simply said parallelizing a workload isn't always a clear win over using a single core. It depends on the required performance level and the efficiency curve of the small core and big core.
If 4 threads running on 4 small cores at 50% FMax can be done by one big core at FMin without wasting any cycles, the advantage actually goes to the big core configuration. The small core configuration works if there's a thread that requires so little performance, it'd be wasteful to run it on the big core even at FMin.
The conclusion of which is best for the given workload isn't as clear cut as saying "look, the small cores are being used by a lot of threads!". But rather, by measuring power and perf using the two configurations.
"If 4 threads running on 4 small cores at 50% FMax can be done by one big core at FMin without wasting any cycles, the advantage actually goes to the big core configuration."
That's hardly a real-world or even valid comparison. Things aren't measured that way.
On the chip level, it all boils down to a direct comparison, which by itself isn't telling much because the core configuration of two different chips isn't usually the only difference. Other metrics start to kick in. Those arguing dual-core wide cores are thinking iOS, which by itself invalidates the comparison. We're talking Android here.
On the software side, real life scenarios aren't easy to quantify.
This article simply states the following:
- Android currently has relatively good parallelism capabilities for common workloads, - Therefore, there is merit in 8 small core and 4x4 big.LITTLE configurations from an efficiency perspective. The latter being beneficial for comparable performance with custom core designs when needed.
Most users are either browsing, texting, or on social media. Most of the games played, BY FAR, are the less demanding ones that usually don't trigger the big cores.
I've said this before in reply to someone else. When QC and Samsung release their custom quad core designs, which do honestly believe would me more power efficient, those chips as-is? Or the same chips in addition to little cores in big.LITTLE (provided they can be properly configured that way).
You guys are deliberately stretching the scope in which the findings of this article applies to.
Just stop.
It has been clearly stated that this only applies to how Android (and Android Apps) manage to benefit from more cores in terms of efficiency. It was clearly stated that this doesn't apply to other operating systems "iOS in particular".
Especially since any app designed for performance will launch as many threads as needed to use available cores. So looking if "there are more than 4 threads active on 4+4 core CPU" can be misleading. If you run those tests on 2 core CPU, would number of threads remain same or be reduced? How about 10 core CPU?
In other words, only comparing performance and power usage (and not number of threads) would tell us if 4+4 is better than 4 or than 2 cores. Problem with that is finding different CPUs on same technology platform (to ensure only number of cores is different, and not 20nm vs 28nm vs different process etc).
Barring that, comparison of power performance per cost among 4+4 vs 4 vs 2 is also better indicator than comparing number of threads.
TD;DR: it is 'easy' to have more threads than CPU cores, but it does not indicate neither performance nor power usage.
The question being answered was this, "On the following pages we’ll have a look at about 20 different real-world often encountered use-cases where we monitor CPU frequency, power states and scheduler run-queues. What we are looking for specifically is the run-queue depth spikes for each scenario to see just how many threads are spawned during the various scenarios."
It was simply assessing if multiple cores were actually used in the little-big design. Not a comparison of different designs.
Yes, that was what the article was trying to find out, but it didn't answer the question of whether 4-core and 8-core designs are better than 2 core or 3 core designs. That's been the contention of the "can't use that many cores" mantra.
All this article has explained is that the OS scheduler can distribute threads across a lot of cores, something hardly anyone has a problem with.
What I'd like to see is the performance or user experience difference between 2-core, 4-core, and 8-core designs, all using the same SoC. There's nothing magic about this. In PCs today, we've largely settled on 2-core and 4-core designs for consumer systems. 6-core and 8-core systems for gaming rigs, but that's largely an artifact of Intel's SKUs.
So, if I believe the marketing, these smartphones really need to have 8-core designs when my laptop or desktop, capable of handling an order of magnitude more computation needs, with just 2-cores or 4-cores?
You're missing the point. Your desktop is faster because it uses much more power. Mobile phones have more cores because it's more efficient to use more lower power cores than fewer high power cores.
My desktop and laptop are power limited at their respective TDPs, and it's been this way for a very long time. If more cores were the answer, why are we sitting at mostly 2-core and 4-core CPUs in the PC space?
All this stuff isn't new whatsoever, and the PC space went through the same core-count race 10 years ago. There has to be something systematically different such that Intel went down this path while ARM smartphones are in the midst of a core-count race.
I've read that it could be an economics thing as ARMH gets money on a per-core basis while spending money on complicated DVFS schemas and high IPC cores isn't worth it for them. Maybe in the PC space, we don't need the performance anymore.
It's because we're always fighting for quicker single-thread work. A lot of things can be parallelized, but there are also a lot of things today that aren't. I agree that Intel should try out some kind of big.LITTLE thing with a couple Atom cores and a Core M, just to see how it runs.
It's Intel's backward looking strategy. They're competing in high power/high single thread performance because they can win that with legacy desktop software and a legacy CPU architecture.
Meanwhile, the rest of the world is going low power multithreaded, because that's the future. Going forward it's the only way to increase performance with low power. Google are correctly pushing an aggressively multithreaded software architecture.
Intel have already hit the wall with single threaded performance. They can win the present but not the future. Desktops aren't moving forward because no-one cares about them except gamers, and gamers largely don't care about power usage because they don't run on batteries.
In the desktop, the costs of making a chip matters much more, as the bigger chips are some times more expensive than a hole mobile device. The costs of putting more cores in the chip counts there.
There are several faces to this problem. The inherent issue with multi core designs is that it is not trivial to develop your application so it uses several cores efficiently. The potential gain in using multi core designs is that it can do the same job for less power than a single core design. The articles answers the question "can today's typical software environment use several cores efficiently?" with a pretty objective yes. It does not necessarily state that this is superior to more simple cores, it even states that it is not relevant for other environments.
Your computer can handle an order of magnitude more computation needs, but don't forget it is using two orders of magnitude more power. The switch to dual core processors in the first place (in desktops) was motivated by the fact the industry hit a wall where they could not raise frequency anymore (heat and power consumption being an issue), where two, more efficient cores could increase performance significantly while still using less power. Of course, the sweet spot does vary depending on the targeted power as well as the environment you're working in.
Do the android phones hit this sweet spot? maybe. But, at least, they are capable of hitting this sweet spot, for that power target and this given environment. That's what this article says.
"Your computer can handle an order of magnitude more computation needs, but don't forget it is using two orders of magnitude more power."
This is simply not true. Apple (and others) are shipping laptops running Broadwell at, what, 4.5W? IF there were massive value in adding smaller cores to the Broadwell package (eg it could drop the average power to 2.5W) wouldn't Intel do that? They could, eg, add 2 or 4 Silvermont cores and have their own big.LITTLE system. They could even automatically switch threads in the HW if they don't want the OS to be involved, they way they handle DVFS automatically on their newest cores.
What I see when I read through these comments is a collection of people not very familiar with OS scheduling who are happy to interpret "OS can schedule multiple threads" as "app requires multiple cores to function well", and a much smaller collection of professionals who understand that the two have little relationship to each other for very short duration threads. There's also a whole lot of claims being made here about power savings on the basis of absolutely fsckall evidence --- Andrei shows absolutely no graphs of the power being used during these runs, and it is HYPOTHESIS, not fact, that running say four lightweight threads on four A53 cores would use less energy than aggregating those four threads on a single A57. Maybe it's true, maybe it isn't --- I don't see any reason to simply assert that it's true.
Well, when Intel ships Broadwell processors at 4.5W, they do consume only a little bit less than an order of magnitude more than your average cortex-A53 cluster. Using big.LITTLE configurations requires a lot of precautions at the very beginning of the conception of both cores. You don't just take lower-end cores and add them to the SoC. Both the higher end and the lower end core must be conceived as being big.LITTLE compatible.
And, however impressive those processors from Intel were, keep in mind that if they didn't put some lower power SKUs out, it's probably because they can't. To get into lower power figures, they are still forced to resort to Atom processors. And Atom branded processos today are... Overwhelmingly 4-cores models.
Once again, I'm not pretending this article is the final proof that 8 core designs are a must. But it shows that, at least, typical use cases are able to use all cores. Not that this is efficient. But there is potential for efficiency.
You need to be a little more careful with slinging around the term "order of magnitude". A 4 core A53 cluster running FP on all CPUs (Exynos 5433, 1300MHz) uses about 865mW. That's a factor of 5 from Broadwell's 4.5W, not a factor of ten.
I'm no fan of much of Intel's work and behavior, but I don't think we are well served when we ignore details, when the details hold most of the interesting facts.
All you're saying is that currently software needs better single thread performance. Duh!
What everyone else is saying is that you can't get increased performance, nor better power usage, going forward with a single thread performance strategy. Physics has spoken!
It's nothing to do with OS scheduling and everything to do with software architecture. Everything is moving towards increasing parallelism and that will continue.
That's why mobile phones now have 8 cores and will have more because they are not weighed down by legacy architectures.
It is worse than that on the development side. Yes, it is non-trivial to develop an app that uses multiple cores efficiently, but it is actually impossible to develop an app that uses multiple cores efficiently on all platforms. Maintaining many different versions optimized for particular platforms is just not plausible when there are so many different platforms.
"I should start with a disclaimer that because the tools required for such an analysis rely heavily on the Linux kernel, that this analysis is constrained to the behaviour of Android devices and doesn't necessarily represent the behaviour of devices on other operating systems, in particular Apple's iOS. As such, any comparisons between such SoCs should be limited to purely to theoretical scenarios where a given CPU configuration would be running Android."
The world does not revolve around Apple. This article has nothing whatsoever with Apple products. Furthermore, the article does neither claim nor imply that wider fat cores are better or worse than big.LITTLE.
Yet to read the full article just looked at the more relevant graphs but i do wish you would have tested some heavier web pages (since AT and BBC are not that heavy), some SoCs with only small cores and look at power too. Would be very curious about power for a quad A53 vs octa A53 at same clocks. Testing GPGPU on the midrange SoCs that actually do that, would be interesting too. Really hope next year we get 2xA72(+some A53s) in 20$ SoCs for 150-200$ phones with very nice CPU perf. Anyway, will read the full article as soon as i find the free time.
Love the article, but after reading it, I feel like the articles you write comparing phone CPU performance & battery life are far more applicable. You lose access to so much of the information in this article that at the end of the day testing the actual phone & OS usage of the CPU makes more sense.
What I'm sincerely missing in this article is the differentiation between multi-processing and multi-threading, with the difference being that multi-processing is partitioning the workload across multiple processes whereas multi-threading spawns threads which are then run in the OS directly or again mapped to processes in different ways -- depending on the OS, in Linux they're actually mapped onto processes.Threads do share context with their creator so shared information requires locking which wastes performance and increases waiting times, the solution to which in the threading happy world is to throw more threads at a problem in the hopes that locking contention doesn't go through the roof and there's always enough work to do to keep the cores busy.
So the optimum way to utilise resources to a maximum is actually not to use MT but MP for the heavy lifting and make sure that the heavy work is split evenly across the available number of to-be-utilised cores.
For me it would actually be interesting to know whether some apps are actually clever enough to do MP for the real work or are just stupidly creating threads (and also how many).
Since someone mentioned iOS: Actually if you're using queues this is not a traditional threading model but more akin to a MP model where different queues handled by workers (IMNSHO confusingly called thread) are used to dispatch work to in a usually lock free manner. Those workers (although they can be managed manually) are managed by the system and adjust automatically to the available resources to always deliver the best possible performance.
Don't forget, most of it is in Java, so it's probably one java process with several threads, not multiple java processes. The native apps, could go either way.
One interesting question here is: What does Google do? Chrome on regular desktop OS uses one process per view to properly isolate views from one another; does anybody know whether Chrome on Android does the same? I couldn't figure it out from the available documentation...
Next time can the colour legend below the graphs have their little squares enlarged to the height of the text? For those who are colour-challenged, it would make it a lot easier to match even when the image is blown-up. There doesn't seem to be a reason to have them so small.
I would rather make the colors more intuitive. For instance by using a colormap like the jet colormap from Octave/Matlab. Low clock frequencies should be mapped to cool colors (blue to green), while high clock frequencies should be mapped to warm colors (yellow to red). By doing that you just have to look at the legend only once. After that, the colors speak for themselves.
The plots are really hard to read now when you have green at both low and high frequency (700 and 1400), and four shades of blue evenly distributed over the frequency range (500, 900, 1100, 1500). When I want to read such a plot, I don't care whether the frequency is 600 or 700. So these two colors doesn't have to be very different. But 500 and 1500 should be wastly different. The plots in this article are made in the opposite way. All the small steps has big color differences in order to be able to distinguish every small step from each other. But at some point the map ran out of majors colors and started repeating the spectrum again, with only slightly different colors.
Andrei, After so much work on your part it seems uncouth to complain! But this is the internet, so here goes...
If you ever have the energy to revise this topic, allow me to suggest two changes to substantially improve the value of the results:
With respect to how results are displayed: - Might I suggest you change the stacking order of the Power State Distribution graphs so that we see Power Gated (ie the most power saving state) at the bottom, with Clock Gated (slightly less power saving) in the middle, and Active on top. - The frequency distribution graphs make it really difficult to distinguish certain color pairs, and to see the big picture. Might I suggest that a plot using just grey scale (eg black at lowest frequency to white at highest frequency) would actually be easier to parse and to show the general structural pattern?
As a larger point, while this data is interesting in many ways, it doesn't (IMHO) answer the real question of interest. Knowing that there are frequently four runnable threads is NOT the same thing as knowing that four cores are useful, because it is quite possible that those threads are low priority, and that sliding them so as run consecutively rather than simultaneously would have no effect on perceived user performance.
The only way, I think, that one can REALLY answer this particular question ("are four cores valuable, and if so how") is an elimination study. (Alternatives like trying to figure out the average run duration of short term threads is really tough, especially given the granularity at which data is reported).
So the question is: does Android provide facilities for knocking out certain cores so that the scheduler just ignores them? If so, I can suggest a few very interesting experiments one might run to see the effects of certain knockout patterns. In each case, ideally, one would want to learn - "throughput" style performance (how fast the system scores on various benchmarks) - effect on battery usage - "snappiness" (which is difficult to measure objectively, but maybe is obvious enough for subjective results to be noticed).
So, for example, what if we knock out all the .LITTLE cores? How much faster does the system seem to run, with what effect on battery? Likewise if we knockout all the big cores? What if we have just two big cores (vaguely equivalent to an iPhone 6)? What if we have two big and two LITTLE cores?
I don't have any axe to grind here --- I've no idea what these experiments will show. But it would certainly be interesting to know, for example, if a system consisting of only 4 big cores feels noticeably snappier than a big.LITTLE system while battery life is 95% as long? That might be a tradeoff many people are willing to make. Or, maybe it goes the other way --- a system with only one big core and 2 little cores feels just as fast as an octocore system, but the battery lasts 50% longer?
This was a seriously fascinating read. It points to a few things...
First, Android has some serious ability to take advantage of multiple cores or ILP has improved dramatically. I remember when the Moto X (1st Gen) came out with a dual core CPU engineers at Moto said that even opening many websites didn't use more than two cores on most phones. [http://www.cnet.com/news/top-motorola-engineer-def...] Does this mean that Android has stepped up its game dramatically or was that information not true to begin with?
Second, It seems like there are two related components to the question that I have about multi-core performance. First, do extra cores get used? (You show that they do. Question answered.) Secondly, do extra cores matter from a performance perspective (if clock speed is compromised or otherwise)? (This is probably harder to answer because cores and clock are confounded - better CPU -> more cores, faster clock and complicated by the heterogeneous nature of these CPUs core setups.)
I suppose the second question could be (mostly) answered by taking a homogeneous core CPU and disabling a cores sequentially and looking at the changes in user experienced performance and power consumption. I'm sure some people will buy something with the maximum number of cores, but I'm just curious about whether it'll make a difference in real-world situations.
A question that DOESNT get answered however is: Does the fact that all cores get used, contribute to a better/faster user experience?
If there was only 2 or 4 cores present, would they complete the tasks just as fast?
In other words, Is there a gain from all 8 cores being used, or does all 8 cores get used just because they are there? (By low priority threads, which in a quad/dual core CPU would have been done sequentially, in just as fast a time?)
Since Apples dual core iPhones, always outperform Android quad and octa core phones, I would think that the latter is closer to the truth.
Read up on what some of the other posters here have written about low priority threads, and Microsofts research on the matter.
And ignore anyone who tries to over-interpret this article!
> Does the fact that all cores get used, contribute to a better/faster user experience? It does not, as long as you CPU can process all the threads in a timely manner. It contributes to a lower power usage though, as power grows following the square of Voltage and voltage usage grows with frequency, while parallelization grows linearly. Basically, if 2 A53 @ 800MHz can do the same amount of work as 1 A53 @ 1.6GHz, the 2 slower cores will do it for less power (refer to the perf/W curve on the conclusion page).
This was the goal of ARM when they designed big.LITTLE and this article shows that the S6 uses it correctly (by using small cores predominantly and keeping frequencies low). It is one more trick to deliver strong immediate computation, good perfs/W at moderate usage and great idle power while idling. I would not extrapolate beyond that as too many variables are in play (kernel/governor/HW/apps...)
"When I started out this piece the goals I set out to reach was to either confirm or debunk on how useful homogeneous 8-core designs would be in the real world"
You mean heterogeneous above rather than homogeneous.
I've been waiting for this piece since the GS6 came out. I can't even imagine the amount of time and work you've put into it. THANK YOU Andre.
Now I hope we can put to rest the argument that Android would do better with only 2 high performance cores VS more core configurations. Google has been promising this for years and they're finally _starting_ to deliver. They're not there yet, lots of work needs to be done to exterminate all that ridiculous overhead (evident in the charts).
I'm also glad that it's finally evident that Chrome on Android VS SBrowser has significant impact on performance and battery life. It should only be fair to ask that Anandtech starts using the built-in browser for each respective device when benchmarking.
We're _just_ reaping the benefits of properly implemented big.LITTLE configurations, in both hardware and software, after 2 years of waiting. What's funny is that both Qualcomm and Samsung are moving away from these implementations back to Quad-core CPUs with Kryo and Mongoose respectively... I personally hope we get the best of both worlds in the form of Mediatek's 10 core big.LITTLE implementation, except the 2 high perf cores being either Kryo or Mongoose for their relatively insane single-threaded performance.
You're coming up with conclusions that aren't aupported by the article.
Can we put 2 vs 8 core argument to rest? Nope.
This test only shows, that when there are 4 (or 8) cores available, Android occasionally uses them all.
It says NOTHING about whether an 8 core CPU would be faster than with 2 wide cores. (Samsung and Qualcomm are moving towards Apple-like wide dual core designs. I doubt they'd do that, if 8 cores were really always faster/economical than 2)
In fact, the article doesn't really tell us whether 8 small cores are faster/more economical than 2 or 4 small cores. Keep in mind what people have brought up about the priority of threads. Some of the threads you see occupying all 8 cores, are low priority threads, that could just as quickly be completed in sequence if there were only 2 or 4 low power cores available.
Are you sure we're on the same page here? We're talking efficiency, right?
"This test only shows, that when there are 4 (or 8) cores available, Android occasionally uses them all."
No it doesn't. Android is capable of utilizing all cores, yes, but it only allocates threads to the amount of cores *needed*, which is much, MUCH more power efficient than elevating a smaller number of high performance cores to their max performance/freq states.
"It says NOTHING about whether an 8 core CPU would be faster than with 2 wide cores. (Samsung and Qualcomm are moving towards Apple-like wide dual core designs. I doubt they'd do that, if 8 cores were really always faster/economical than 2)".
True, it doesn't show direct comparisons with modern wide cores running Android, because there isn't any. But even taking MT overhead and core switching overhead into account, I believe it's safe to say that things should be comparable (since the small cluster is rarely saturated), except (again) much more efficient. And no, QC and Samsung aren't moving to any dual core configuration; they're both moving to Quad-core configuration (ie: the most optimal for Android), which further proves the argument that more cores running at a lower frequency (and lower power draw) is more efficient than having fewer cores running at their relative max for MOBILE DEVICES.
The problem isn't the premise, it's the means. ARM's reference core designs aren't optimal in comparison to custom designs neither in performance nor in power consumption. Theoretically speaking, if Qualcomm or Samsung use little versions of their custom cores in 8-core configurations, or 4x4 big.LITTLE, we might theoretically see tremendous power savings in comparison. Again, this applies to Android based on this article.
"In fact, the article doesn't really tell us whether 8 small cores are faster/more economical than 2 or 4 small cores."
This article STRICTLY talks about the impact of 4x4 core big.LITTLE configuration has on ANDROID if you want BOTH performance and maximum efficiency. It clearly displays how Android (and its apps) is capable of dividing the load into multiple threads; therefore having more cores has its benefits. Also, you can clearly see that there is noticeable overhead here and there, and throwing more cores at the problem, running at lower frequency, is a better brute force solution to, AGAIN, maximize efficiency while maintaining high performance WHEN NEEDED, which is usually in relatively short bursts. Android still has ways with optimization, but its current incarnation proves that more cores are more efficient.
You are making the wrong comparisons here. What you should be asking for is comparisons between a quad-core A57 chip, VS an 8-core A53 chip, VS a 4x4 A57/A53 big.LITTLE chip. That, and only that would be a valid apples-to-apples comparison, which in this case is only valid when tested with Android. Unfortunately, good luck finding these chips from the same manufacturer built ont he same process...
Qualcom's next custom core is 2+2 but Samsung's is 4+4. But I agree with the gist of your argument. Different core counts, but they all aim the same goal - performance and efficiency.
Wow, excellent article. Colour me impressed that the developers use 4 cores effectively more times than not. It was not what I was expecting. Nor did I realise how much of the video processing task was offloaded to the GPU. In fact it's so good I suspect there will be more than a few electrical engineers poring over this in order to understand how well their software brethren make use of the hardware they provide.
A technical note regarding "...scaling up higher in frequency has a quadratically detrimental effect on power efficiency as we need higher operating voltages..." - note that power consumption *already* goes up quadratically as voltage squared, BEFORE including the frequency change (i.e. P = k*f*v*v). So if you're also scaling up voltage while increasing frequency, you get a horrific blowing-up-in-your-face CUBIC relationship between power and frequency.
Being in the Apple camp, I do know Apple also highly encourages developers to use multithreading as much as possible with their Grand Central Dispatch API, and has implemented things like App Nap and timer coalescing to help with the "race-to-idle" in OS X. I'm guessing Apple is likely taking this into account when designing their ARM CPUs as well. The thing is, unlike OS X, iOS and their A-series CPUs are mostly a black box of unknowns, other than whatever APIs they let developers use.
For web browsing i do wish you would look at heavier sites, worst case scenario since that's when the device stumbles and look at desktop versions. Would be nice to have a total run-queue depth graph normalized for core perf and clocks ( so converted in total perf expressed in w/e unit you like) to see what total perf would be needed (and mix of cores) with an ideal scheduler - pretty hard to do it in a reasonable way but it would be an interesting metric. After all the current total is a bit misleading by combining small and big , it shows facts but ppl can draw the wrong conclusions, like 4 cores is enough or 8 is not. Many commenters seem to jump to certain conclusions because of it too. Would be nice to see the tests for each cluster with the other cluster shut down, ideally with perf and power measured too. Would help address some objections in the comments. In the Hangouts launch test conclusion you say that more than 4 cores wouldn't be needed but that doesn't seem accurate outside the high end since if all the cores were small. assuming the small cores would be 2-3 times lower perf, then above 1.5 run-queue depth on the big cores might require more than 4 small cores if we had no big ones. Same goes for some other tests A SoC with 3 types of cores , 2 of them big ,even bigger than A72 , and a bunch of medium and small does seem to make sense, with a proper scheduler and thermal management ofc. For midrange 2+4 should do and it wouldn't increase the cost too much vs 8 small ones, depending a bit on cache size - lets say on 16ff A53 bellow 0.5mm2 , A72 1.15mm2 and cache maybe close to 1.7 mm2 per 1MB. so a very rough approximation would be 2-3mm2 penalty depending if the dual A72 has 1 or 2MB L2. a lot more if the dual A72 forces them to add a second memory chan but even then it's worth the cost, 1-2$ more for the OEM would be worth it given the gain in single threaded perf and the marketing benefits When looking at perf and battery in the real world multitasking is always present in some way. in benchmarks, never is. So why not try that too, something you encounter in daily usage. a couple of extra threads from other things should matter enough - maybe on Samsung devices you could test in split screen mode too, since it's something ppl do use and actually like. For games it would be interesting to plot GPU clocks and power or temps as well as maybe FPS. Was expecting games to use the small cores more to allow for more TDP to go to the GPU and the games you tested do seem to do just that. Maybe you could look at a bunch of games from that perspective. Then again, it would be nice if AT would just start testing real games instead of synthetic nonsense that has minimal value and relevance A look at image processing done on CPU+GPU would be interesting. The way Android scales on many cores is encouraging for glasses where every cubic mm matters and batteries got to be tiny. Do hope the rumored Mercury core for wearables at 50-150mW is real and shows up soon. Oh and i do support looking at how AT's battery of benchmarks is behaving but a better solution would be to transition away from synthetic, no idea why it takes so long in mobile when we had the PC precedent and nobody needs an explanation as to why synthetic benchmarks are far from ideal. Anyway, great to see some effort in actually understanding mobile, as opposed to dubious synthetic benchmarks and empty assumptions that have been dominating this scene.AT should hire a few more people to help you out and increase the frequency of such articles since there are lots of things to explore and nobody is doing it.
Linux had largely been guided towards massively multiprocess workloads. If they didn't do this well then they wouldn't do anything well. The scheduler should be getting a lot better soon. It APPEARS that, after a long long long long time, things are moving forward on the combined scheduler (cfs), cpuidle, and cpufreq (dvfs) front. That's necessary in order to proper scheduling of tasks, especially across an aSMP soc. One thing to keep in mind is that these oems often carry out of tree patches that they believe help their hardware. Often these patches are of, ahem, suspect quality, and pretty much always perform their task with some noticeable drawbacks. The upstream solution is (almost?) always "better". Iow, things should only get better.
Great piece from AT as usual. Very interesting to say the least.
I remember reading another good piece about multiple core usage in Android in one of the android themed websites (Androidcentral?). It was a much simpler analysis and the premise was to debunk the myth that more cores are pointless at best and counter productive at worst.
The conclusion from the tests were unequivocal. Android DOES make use of multiple cores, both via multi threaded programs and discrete system tasks. So core count DOES matter, at least to an extent.
No-one is denying that "Android can make use of multiple cores". What they are denying (for this article and similar) is that the core are - a useful way to save power OR - a way to make the phone feel/behave faster
This article, you will notice, does not answer either of those issues. It doesn't touch power, and there've been plenty of comments about above (by myself and others) why what is being shown here has no relevance to the issue of performance.
Do you really want to insist on claiming that articles prove what they manifestly do not, and insist on ignore the (well-explained) concerns of engineering-minded people about the sorts of tests that ARE necessary to answer these questions? Wouldn't you rather be on the side of the engineers trying to understand these issues properly, than on the side of the fanboys, uninterested in the truth as long as you can shout your tribal slogans?
You make no sense. Ever heard of the benchmarks? If all cores are used (which this article proves as a fact), and if the benchmark shows the chip scoring higher than the chip with less cores - then yes, more cores means better performance. It is a matter of that simple logic.
And the whole massive myth that this analysis dispelled was exactly the following - more cores is an useless gimmick BECAUSE ANDROID APPS CANNOT make use of them
Absolutely fascinating stuff. I was seriously not expecting to see Mediatek's ultra high core count strategy vindicated by real world measurements. That's the great thing about taking measurements instead of just speculating.
As a follow up, it would be fascinating to see how selectively disabling different number of cores effects timed tests.
For instance, select an extremely CPU heavy web site like Forbes and see if allowing half as many cores makes rendering the home page take twice as long.
Excellent article as usual by Andrei. As an owner of a phone with the MT6592 Mediatek 8-core A7 chip, I was also skeptical about the point of having so many small cores. I only got the phone because it was cheap :) I've seen all 8 cores spike to max frequency when loading complex web pages or playing games. For common tasks, only 2 or 4 cores are used. I've also found that down clocking it doesn't slow things down much and yields longer battery life; modifying the single core upfreq and additional core activation thresholds could be key to optimizing these chips to one's usual workload.
Good comment - I've been pondering this all morning, hence why I'm back on this article. Looking at an A9 Pro right here, 4.4 configuration.
It seems that the low cores have a min freq of 400MHz, which I find interesting, as they seem to sit a 691MHz most of the time. Two of the big four sit in sleep, with the other two at 833MHz.
I wonder how adjustment to the larger cores may improve battery life.
Anandtech does it again. You are my entertainment and knowledge at the same time. My thoughts: Quite not surprising after seeing some benchmarks of some SoCs in one or two years since. I think the question here is performance versus more cores. More but smaller cores are best for efficiency and probably better marketing. The only problem with these smaller cores is performance which is why we often see them on cheaper devices and doesn't feel as fast. We still need more frequency for some big things and I believe a fast dual core will answer that. So, I can't wait to see the X20.
An interesting and thorough analysis, although I'm concerned at some of the assumptions made in some of the conclusions. Just because a queue of 4 threads makes all the 8 big.LITTLE cores active doesn't mean that the architecture is effective. For all we know, the threads are thrashing back and forth, draining precious performance per watt.
I'm still not convinced. The fact that it's doing what it does on these chips doesn't mean that their performance is as good as it could be, or that power efficiency is as good. We really need to see two to four core designs, with cores that are really more powerful, to make a proper comparison. We don't have that with the chips tested.
Exactly. It should at least show a design with a small number of powerful cores. Obviously with Apple's A series chips you have the issue of dealing with a different operating system underneath, but can't they use a Tegra K1 or something?
The stacked frequency distribution graphs would be a *lot* easier to read if you used a consistent range of different saturations/intensities of a single colour (e.g. go from bright=fast to dark=slow), or a single pass from red to blue through the ROYGBIV colour spectrum (e.g. red=fast, blue=slow), to represent the range of frequencies.
By going around the colour wheel multiple times in the colour coding it's *really* hard to tell whether a given area of the graph is high or low frequency. The difference in colour between 1400/800, 1296/700, and 1200/600 are very subtle to say the least.
anandtech always uses weird non-popular words on its own site type ''heterogeneous '' never heard in my life and even usa or uk ppl have to search in cambridge/oxford dictionary :DDD Immediately u can say it is DEFO NOT USA or UK website.. They do not use such difficult words AT ALL :)
ANd mainly they use when it comes to china products .. like mediatek or kirin or big.little topic etc.. This site is DEVOURED or we could say powered by apple.inc :)
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
157 Comments
Back to Article
R0H1T - Tuesday, September 1, 2015 - link
Seems like Android has Windows' number as far as "multi-threading" is concerned, kudos to Google for this & seems like the tired old argument of developers getting a free pass (for poor MT implementation on desktops) needs to change asap!Impulses - Tuesday, September 1, 2015 - link
Ehh, I think you're ignoring some key differences in clock speed and single threaded performance, not to mention how easily Intel can ramp clock speed up and back down, and then there's Hyper Threading which allows you to span more threads per core.Laptops might be the outlier, but I dunno what benefit a desktop (which have commonly run quads for years) would see from a lower powered core cluster. Development just works very differently by nature of the environment.
Also things that benefit a ton from parallelization on the desktop often end up using the GPU instead... And/or specialized instructions that aren't available at all on mobile. It's not even apples and oranges IMO, it's apples and watermelons.
R0H1T - Tuesday, September 1, 2015 - link
You're missing the point, which is that Google & Android have shown (even with the vast number of SoC's it runs on) that MT & load management, when implemented properly, on the supported hardware & complementing software, makes great use of x number of cores even in a highly constrained environment like a smartphone.On desktops we ought to have had affordable octa cores available for the masses by now, but since Intel has no real competition & they price their products through the roof, we're seeing what or how windows & the x86 platform has stagnated. Granted that more people are moving to small, portable computing devices but there's no reason why the OS & the platform as a whole has to slow down, also the clock speed, IPC argument is getting old now. If anything DX12, Mantle, Vulkan et al have shown us is that if there's good hardware & the willingness to push it to its limits developers, with the right tools at hand, will make use of it. Not to mention giving them a free pass for badly coded programs, remember the "ST performance is king" argument, is the wrong way to go as it not only wastes the (great) potential of desktops but it also slows down the progress of PC as a platform.
Now I know MT isn't a cakewalk especially on modern systems but if anything it should be more widespread because desktops & notebooks give a lot of thermal headroom, as compared to tablets & smartphones, besides the 30+ years of history behind this particular industry should make the task easier. Also not all compute tasks can be offloaded to GPU, that's why it's even more imperative that the users push developers to make use of more cores & not get the free ride that GPGPU has been giving them over the last few years, as it is the GPU industry is also slowing down massively & then we'll eventually be back to square one & zero growth.
metafor - Tuesday, September 1, 2015 - link
Yes and no. Google and Android are able to show that things like app updates, web page loads and general system upkeep is able to take advantage of multiple threads. But that's been true for a while. In a smartphone, those happen to be the performance dominating tasks. On a desktop, those tasks are noise.Desktop workloads that actually stress the CPU (and users care about performing well) are very different. That's not to say they're not threadable, but they may not be as threadable as Chrome, which basically eats RAM and processes.
That being said, heterogenous MT could make a lot of sense for laptop processors as well. Having threadable workloads run on smaller Atoms instead of big Sky Lakes would probably improve efficiency. But it may not be as dramatic depending on the perf/W of Sky Lake at lower frequencies.
niva - Tuesday, September 1, 2015 - link
OK can we talk about this for a bit. I for one found the webpage CPU usage extremely disturbing. I'm running an old phone, Galaxy Nexus, and browsing has become by far the task my phone struggles with the most. Why is that? What is it about modern websites that causes them to be so CPU heavy? Is that acceptable? It does seem that much of the internet is filled with websites running shady scripts in the background and automatically playing video or sound which is annoying at the very least, but detrimental to performance always. Whatever happened to website optimization for minimizing data usage and actually making websites accessible?Secondly, what is the actual throughput of CPUs in desktops compared to the latest state of the line arm APUs? Just because desktop workloads might be different, does that mean that a mobile APU cannot handle it or is that simply due to the usage mode of the device in question? What I'm seeing out of mobile/phone chips is that they are extremely capable, to the point I'm starting to wonder if I'll ever need another desktop rig to replace my old Phenom X2 machine.
metafor - Tuesday, September 1, 2015 - link
I would guess that websites are just more complicated nowadays. Think about a dynamic website like Twitter, which has to have live menus and notifications/updates. That's basically a program more than just a web page. We've slowly migrated what used to be stand-alone programs to load-on-demand web programs. And added many many inefficient layers of script interpreters in between.emn13 - Thursday, September 3, 2015 - link
Somewhat ironically, the more modern a web-page the *less* friendly it is likely to be to multithreading. After all, modern features tend to include heavy javascript usage (which is almost purely single-threaded), and a CPU usage that is bottlenecked by a path through javascript (typically layout not actually the JS itself, but that layout affects JS and hence needs fine-grained interaction).Jaybus - Tuesday, September 1, 2015 - link
It is the more extensive use of client-side processing, in a nutshell, JavaScript and JSON. On older websites, they dynamic stuff was processed server-side and the client simply did page reloads. The modern sites require less bandwidth, but at the expense of increasing CPU usage.Also, modern sites are higher res and more image intensive, or in other words more GPU heavy as well. Some of the Nexus struggle can be attributed to GPU load.
mkozakewich - Wednesday, September 2, 2015 - link
Most of it has to do with using multiple JavaScript libraries. It's not strange to need to download over 50 different files on a website today. Anandtech.com took 123 requests over four seconds to load. Mostly fonts, ads, and Twitter stuff, but it adds up.name99 - Tuesday, September 1, 2015 - link
You are totally misinterpreting these results.The mere existence of a large number of runnable threads does not mean that the cores are being usefully used. Knowing that there are frequently four runnable threads is NOT the same thing as knowing that four cores are useful, because it is quite possible that those threads are low priority, and that sliding them so as run consecutively rather than simultaneously would have no effect on perceived user performance.
There is plenty of evidence to suggest that this interpretation is correct.
Within the AnandTech data, the fact that these threads are usually on the LITTLE cores, and running those at low frequency, suggests they are not high priority threads.
This paper from MS research confirms the hypothesis:
http://research.microsoft.com:8082/en-us/um/people...
Now there is a whole lot of tribalism going on in this thread. I'm not interested in that; I'm interested in the facts. What the MS paper states (confirmed, IMHO) by these AnandTech results, is that there is a reasonable (around 20%) throughput improvement in going from one to two threads, along with a small (around 10%) energy drop, and that going from two to three or four cores buys you only very slight further energy and performance boosts.
In one sense this means there's no harm in having octacores around --- they don't seem to burning energy, and in principle they could deliver extra snappiness (though the lousiness of the scheduling in these AnandTech results suggests that's more a hope than a reality). But there's a world of difference between the claim "doesn't hurt energy, may occasionally be slightly useful" and the claim "pretty much always useful because apps are so deeply threaded these days".
lilmoe - Tuesday, September 1, 2015 - link
"we're seeing what or how windows & the x86 platform has stagnated"Your argument is highly inaccurate and extremely dated. This isn't Windows XP era anymore... Windows 10 and 10 Mobile might as well be better than Android in what you're giving kudos to Google for (which they've somewhat managed after YEARS of promises). There's still a huge chunk of overhead in Android's rendering pipeline that needs serious attention. Android has made huge improvements, yes, but there still lots of work that needs to be done.
@Impulses has a good point too; It's extremely difficult to get a fair apples-to-apples comparison when it comes to optimal handling of workloads for varying thermal limits. CPUs at ~2W TDP behave VERY differently from those at 15W, and both behave yet differently from those running at 37W+. This becomes evident when middle ground ~5W mobile CPUs are in the picture, like Intel's Core M, where devices running those are showing no better battery life than their 15W counterparts running the same OS. (Windows 10 is changing that, however, and is showing extreme battery savings in these lower TDPs, more so than the improvements in higher TDP parts, which tells a lot about W10).
If that isn't clear enough already, read the article again. The author CLEARLY mentions in the first page not to make the mistake of applying the aforementioned metrics to other platforms and operating systems, and to strictly stick with Android and big.LITTLE.
Alexvrb - Tuesday, September 1, 2015 - link
Thank you lilmoe and name99! I read his comment and I was like really? These results don't support his claims and were never intended to compare platforms - as specifically stated by the author.R0H1T - Thursday, September 3, 2015 - link
XP to win10 took what a decade & a half? Vista was the last major change, after XP, DX10 & UAC, with win7 then win8 & now win10 bringing only incremental updates. Yeah I call that slow & we've had quad cores since what nearly a decade now, even then a vast majority of systems (desktops+notebooks) are dual core or 2 cores+HT surely that must make you cringe! Then we have programs that don't make use of multiple cores efficiently &/or the latest instruction sets like AVX. There's just a single web browser, that I know of, which uses the latter on PC! Call it whatever you may or twist it however you like to but this is one of the major reasons that PC sales are declining not just "everyone owns one & so they don't need it" excuse that's thrown around far too often. So far as "extrapolating this article to my observations" argument is concerned, there's no need to do that since there's historical precedence & copious amount of evidence to support pretty much every word of what I've said.Azethoth - Thursday, September 3, 2015 - link
Ugh dude, you have no idea what you are talking about. 4.4 architectures on a phone are a desperate attempt to reduce power usage. I am a programmer and compile times matter to me and threading helps. Even so going from 8 threads on my desktop CPU to 12 threads on the E CPU a year later only reduces a total recompile of 26 minutes by 2-3 minutes. But that E cannot clock as high, so in the regular incremental compile case it is slower. Do you get this? You are factually wrong for an actual core dependent use case.Now I can stick my head in the sand like you and pretend that more cores are automatically better but it just isn't for my workload. You may as well bitch that I should be running on multi thousand dollar server CPUs with 16 cores. Again no. They have their place in a server, but no place in my desktop.
Samus - Tuesday, September 1, 2015 - link
If "Google and Android" have 'nailed' MT then why do $600+ Android phones feel more sluggish, have a choppier UI, and launch programs slower than a 3 year old iPhone 5 or Lumia 800?Perhaps because the kernel and underlying architecture are so bloated because they need to support so many SOC's. They've resorted to heavy compression just to keep distribution sizes down, which also hits performance.
Android only has one place, on cheap phones. You're an idiot if you buy a $600+ Android phone when you get the same crappy experience on a $50 Kyocera.
I've tried so hard to like Android over the years, but every device I've had completely disappointed me compared to older Blackberry and modern iPhone devices where you don't need to find hacked distributions when manufactures drop the ball supporting the phone, or just make a crappy ROM in general. Even Nexus devices aren't immune to this and historically they haven't been very good phones, although admittedly, the only high-end Android phone worth buying is a Nexus, but now they cost so much it isn't justifiable.
Basically I recommend two phones to people. If they want a cheap phone, get a OnePlus One or some other sub-$300 Android device. If you're budget is higher, get an iPhone, or if you are adventurous, a WinMo device. At least the iPhone will receive support for 4-5 years and holds its value during that time.
Buk Lau - Tuesday, September 1, 2015 - link
I'm calling BS on most of your claims. Your experience with a Moto E (not saying it's a bad phone) will be vastly different from that of a Note 5, and those differences can start as obvious as how often you need to refresh your Chrome pages as you run out of RAM.What "600+" Android phone are you talking about that feels “more sluggish and slower” than a 3 year old iPhone? If you want people to take your claim seriously then at least provide some examples rather than this generic BS that anyone can easily come up with.
The way Android is designed makes it kind of difficult to bring updates as surprising as you may found. Every time the OS updates, there are changes to the HAL (hardware abstraction layer) and those changes can be minor or significant. It is then up to the SoC provider to provide the proper drivers needed after the HAL change, and they certainly won’t provide it for free. At the same time, OEM also have to decide how much the new update will impede performance. For example my first gen Moto X got an update to 5.1.1 a few months ago and despite the new features, there are still performance hits in places. Even older devices probably will do better on Jelly Bean and KitKat anyways since Google Play services can be updated independent of OS version.
Here’s some useful info on why Android is as fragmented as it is
http://www.xda-developers.com/opinion-android-is-i...
The biggest reason Apple updated all those 4S isn’t because how they loved their users, but rather to purposely slow down their devices to force them to upgrade. You can just ask the 4S users around you to see what iOS 8 really meant for them.
I do agree however that people should try more $300-400 devices that are near flagship level with compromises that are more tolerable, and this $600+ smartphone price should really tone itself down a bit.
Kutark - Tuesday, September 1, 2015 - link
Yeah i have to call bullshit on his claims too. I mean i know its anecdotal, but my buddies and i have had literally dozens of android phones over the years, as well as various iphones. And none of us have seen any kind of performance difference between the two. Im thinking he just had a shit experience with one android phone and like most people just wrote it off at that point.I have had a bad experience with an HTC Rezound, but every phone ive had before or after that has been fantastic. I absolutely LOVE my LG G3, its extremely responsive and fast, and i've never had issues with slowdowns on it. That being said i dont do any "gaming" (and i put gaming in quotes for a reason) on the phone, so i can't speak to that. But as far as browser, youtube, other apps, etc. It couldn't be more perfect.
Samus - Wednesday, September 2, 2015 - link
I'm at IT director and I have a "shit experience" with android phones people bring to me every week.Defending android is like defending your Kia Rio. It's a low cost tool to fit a low cost niche. The experience is the same no matter who is driving.
Kutark - Wednesday, September 2, 2015 - link
If you say so. As an IT director you should know that 99% of the time there is a problem, its user related and not hardware related. One thing i will give apple is that they lock their products down so hard that its much harder for the user to F it up. Whereas on more open platforms like android or windows, the user has much more control and thus much more ability to F things up royally.Whether thats a plus or a minus really just depends on what you're looking for. For people who want or need control over their hardware, its a plus, for people who just want something "to work" so to speak, its a minus.
mkozakewich - Wednesday, September 2, 2015 - link
Your claim that Apple is trying to slow down devices throws off your entire argument, really.Samus - Wednesday, September 2, 2015 - link
They are both clearly android fans and haven't ever given anything else a chance. The fact they ignore Apple has consistently had superior single threaded performance in their SOC's years and this has translated to better UX just goes to show that Android targeting multithreaded performance is a solution looking for a problem. There are so many underlying issues to address first, specifically making efficient use the Linux scheduler and perhaps setting a compatibility list for hardware instead of saying just make anything and we'll find a way to run on it no matter how crappy it runs.tuxRoller - Wednesday, September 2, 2015 - link
Apple had not consistently had better performance per core. That's fairly recent (since cyclone, iirc). There are myriad issues at play.In the end, the market is best served by an open option, like Android, and customers choosing what works best for them and letting the rest fade away.
name99 - Wednesday, September 2, 2015 - link
"Apple had not consistently had better performance per core. That's fairly recent (since cyclone, iirc). "Since Swift. That's iPhone 5, 5S, 6 (2012, 2013, 2014) and likely to be 6S and 2015 at least.
Even the late-stage pre-Apple cores were substantially above average (in part because of Apple's custom SoC). The 4S was above the competition at the time:
http://www.anandtech.com/show/4971/apple-iphone-4s...
Most people would consider "consistent enough" for "long enough" to make the statement reasonable.
lopri - Wednesday, September 2, 2015 - link
And it is not like Apple don't resort to moar-cores. When they run into walls, they also have no choice but to take whatever routes that are available. Listening to some of the zealous Apple fans, one would mistake that iPhones have been rocking on a single-core all these years.They have moved to dual-cores on the phones, and 3-cores on tablets. Moar-cores on iDevices are only a matter of time. Those specialized ASICs with fancy names apple give ("Motion Processor" for one) are also a concession made by Apple that there are cases where big cores are not always the best route to take when efficiency matters.
Buk Lau - Wednesday, September 2, 2015 - link
"They are both clearly android fans and haven't ever given anything else a chance."uhh my first smart device ever is a 2nd gen iPod touch...
So just because I proved you wrong, I have to be an Android fanboy? You said you tried all these Android phones "every week" and have "shit experiences." Again, you didn't bring up any names or so. What phones have you even tried? Who's being a fanboy here and can only provide claims without backing them up with facts?
I don't understand why you are arguing about this superior ST performance when it's irrelevant to this article. What this article simply proves is that Android does make use of extra threads and you get a benefit in power efficiency due to running MT thread, nothing about performance. In fact in most scenarios shown in the test most of the little cores are even saturated which means the workload isn't heavy at all.
"Apple has consistently had superior single threaded performance in their SOC's years and this has translated to better UX"
any evidence that leads to this conclusion? also like tuxRoller said Apple only have IPC advantages in recent years with Cyclone series.
"There are so many underlying issues to address first, specifically making efficient use the Linux scheduler and perhaps setting a compatibility list for hardware instead of saying just make anything and we'll find a way to run on it no matter how crappy it runs."
Where did you get the concept of make anything and find a way to work? All OEMs and SoC manufacturers optimize for Android just like how they optimize for Windows in desktop. Like I said before, SoC manufacturers have to provide driver update every time there's a HAL change in Android. How well they can do to optimize is up to themselves but the fact is that they do have to make their hardware compatible for Android
Kutark - Wednesday, September 2, 2015 - link
Did i suddenly log onto the pcgamer forums? The instant someone expresses any level of dismay or concern for an apple product, or says they have good experiences with android phones, it automatically means they're a nutswinging fanboy?Buk Lau - Wednesday, September 2, 2015 - link
You can argue whether Apple is intentional or not but the end result is that 4S users are getting more sluggish experiences with their 4S after updated to iOS 8tuxRoller - Wednesday, September 2, 2015 - link
Linux isn't great about niceness There's a few ways to fix this. One is to use cgroups ,(which Android uses). This works pretty well but I'd still subject, ultimately, to the scheduler. The other way is to run the rt kernel. That obeys priorities nicely (heh), but would be a bear to wrestle into Android and you'd lose some power efficiency. Also the rendering framework of Android may have some issues.darkich - Friday, September 4, 2015 - link
Im calling not only BS, but a truckload of it.Just so full of ignorance and prejudice that it's probably not worth a thorough reply..if you do want one though, let me know and you will be served.
brianpauler - Saturday, July 17, 2021 - link
This is a very valuable change of Reddit. If you are a regular user of Reddit, you probably know this Reddit video downloader:nightbringer57 - Tuesday, September 1, 2015 - link
Very interesting article, much more favourable to multi-core designs than I would have thought.Each article page must have cost an insane amount of time. However, I still feel like some more information could have been useful. This article is geared towards real-world use cases, but I think it would be interesting to repeat this analysis on a few commonly-used benchmarking apps. I feel like this would be interesting to compare them to real-world uses and may help understanding the results.
ingwe - Tuesday, September 1, 2015 - link
Yes that would be very interesting. I am always curious about how synthetics actually compare to more real world applications.Azethoth - Thursday, September 3, 2015 - link
Every single synthetic I have ever seen vastly exaggerates the benefit. I would be interested in an actual real world use case that actually matches a synthetic. It would blow my mind if there are any.Andrei Frumusanu - Tuesday, September 1, 2015 - link
I'll do a follow-up pipeline on this if the interest is high enough.bug77 - Tuesday, September 1, 2015 - link
High enough +1.Please do the follow-up.
tipoo - Tuesday, September 1, 2015 - link
I'd definitely be interested.Drumsticks - Tuesday, September 1, 2015 - link
Yes! This would be neat. Also, great article!ThisIsChrisKim - Tuesday, September 1, 2015 - link
Yes, Would love a follow-up.HanakoIkezawa - Tuesday, September 1, 2015 - link
I'm not sure of the practicality, but I would love to see a follow-up with Denver k1 and the A8X to see how lower core count out of order and in order SoCs are handled.This seriously was a fantastic article Andrei!
kspirit - Tuesday, September 1, 2015 - link
Yes please! +1modulusshift - Tuesday, September 1, 2015 - link
Heck yes. And of course I'm interested if anything like this is even remotely possible for Apple hardware, though likely it would require jailbreaks, at least.Andrei Frumusanu - Tuesday, September 1, 2015 - link
Unfortunately basically none of the metrics measured here would be possible to extract from an iOS device.TylerGrunter - Tuesday, September 1, 2015 - link
Add one more vote for the follow up with synthetics.I would also want to see how the multitasking compares with the Snapdragons as they use the different frequency and voltage planes per core instead of the big.LITTLE.
But I guess that would be better to see with the SD 820, as the 810 uses big.LITTLE. Consider it a request for when it comes!
tuxRoller - Wednesday, September 2, 2015 - link
Big.little can use multiple planes for either cluster. The issue is purely implementation, tmk.TylerGrunter - Wednesday, September 2, 2015 - link
big.LITTLE can be use different planes for each cluster but same for all cores in each cluster, Qualcomm SoCs can use different planes for each core, that's the difference and it's a big one.https://www.qualcomm.com/news/onq/2013/10/25/power...
I'm not sure that can be done in big.LITTLE.
tuxRoller - Friday, September 4, 2015 - link
I remember that but that doesn't say that big.LITTLE can't keep each core on its own power plane just that the implementations haven't.soccerballtux - Tuesday, September 1, 2015 - link
to balance everything out-- meh, that doesn't interest me. most of the time I'm concerned with battery life and every-day performance. Android isn't a huge gaming device so absolute performance doesn't interest me.porphyr - Tuesday, September 1, 2015 - link
Please do!ppi - Tuesday, September 1, 2015 - link
Go ahead. This is one of the most interesting performance digging on this site since the random-write speeds on SSDs.jospoortvliet - Friday, September 4, 2015 - link
Yes, this was an awesome and interesting read.lilmoe - Tuesday, September 1, 2015 - link
"if the interest is high enough":/ Really?
zaza - Saturday, September 5, 2015 - link
Yes Please. It would be nice to see if the same or similar tests works on snaprdagon 810,801 and 615 and Mediatek chips, and intel SoCerchni - Thursday, September 17, 2015 - link
A follow-up with synthetic would be quite interesting.aryonoco - Saturday, September 5, 2015 - link
I just wanted to reiterate the point here an thank the author for this great piece of technical investigative journalism.Andrei, thank you for this work. It is hugely invaluable and insightful.
tipoo - Tuesday, September 1, 2015 - link
Very interesting article. Seems like the mantra of "more cores on mobile are just marketing" was wrong in terms of Android, seems to dip into both four core big and little clusters pretty well. That puts the single thread performance having lagged behind the Apple A series (up until the S6 at least) in a new light, since it can in fact use the full multicore performance.tipoo - Tuesday, September 1, 2015 - link
*That is, barring gaming. More core Android functions do well with multithreading though.jjj - Tuesday, September 1, 2015 - link
In gaming there is a big advantage. By using mostly the small cores you allow for more TDP to go to the GPU. One more relevant thing would console ports in the next couple of years when mobile GPUs will catch up with consoles. The current consoles have 8 small cores and that fits just right with many small cores in Android.retrospooty - Tuesday, September 1, 2015 - link
Not really sure whos "mantra" that was. People that don't understand what the big.little architecture is like some angry Apple fans?tipoo - Tuesday, September 1, 2015 - link
Well sure, whoever they were, but it was a pretty common refrain for every 8 core SoC.soccerballtux - Tuesday, September 1, 2015 - link
for one, it was my mantra. I liked having 4 cores because 2 wasn't enough, but according to my hotplugging times, I only really need 3 for optimal experience most of the timeTylerGrunter - Tuesday, September 1, 2015 - link
In fact you are in the right place to ask that question, as one of the profets os the mantra was Anand Lal Shimpi himself:http://www.anandtech.com/show/7335/the-iphone-5s-r...
Quoting from the article:
"two faster cores are still better for most uses than four cores running at lower frequencies"
You can read the rest if you are interested, but that´s how much of the mantra started.
retrospooty - Tuesday, September 1, 2015 - link
I wont hold that against Anand, he was lobbying toward a job at Apple ;)But seriously, it was 2 years ago. At that time ""two faster cores are still better for most uses than four cores running at lower frequencies" may well have been the case. Also, no matter how you slice it, an 8 core big.little is not a true 8 core CPU. It's really still 4 cores.
retrospooty - Tuesday, September 1, 2015 - link
/edit. I do remember alot of people crying "you dont need 8 cores" but again, that was people misunderstanding ARM's big.little architecture made worse by marketing calling it "8" cores" in the first place.TylerGrunter - Tuesday, September 1, 2015 - link
I agree with you, and he may not have been THAT wrong at the time. But with the current implementations of power gating and turbos most of what he said has been rendered false.AFAIK, big.LITTLE can be a true 8 core, it actually depends on the implementation.
lilmoe - Sunday, September 6, 2015 - link
"Also, no matter how you slice it, an 8 core big.little is not a true 8 core CPU. It's really still 4 cores."An 8 core big.LITTLE chip running in HMP mode (like the Exynos 5422 onward) is in fact a "true" 8 core chip in which all 8 cores can be running at the same time. You're thinking core migration and cluster migration setups in which only 4 cores (or a combination of 4) can be running at the simultaneously.
lilmoe - Sunday, September 6, 2015 - link
"can be running at the simultaneously."*corrected: can be running simultaneously.
osxandwindows - Friday, September 25, 2015 - link
If i run all 8 cores at the same time, wood it affect battery life?mkozakewich - Wednesday, September 2, 2015 - link
If the option is really four weak cores or two powerful cores, I think the two powerful ones would make a better system. If we could have two powerful cores AND four weak cores, that would be even better.So I think he was probably justified.
mkozakewich - Wednesday, September 2, 2015 - link
Just everyone who's easily influenced, really. I heard it from pretty much everyone. Someone I was talking to apparently "knew someone who designed a Galaxy phone." He claimed they wanted to design it with two cores, or something, but the marketers wanted eight.StormyParis - Tuesday, September 1, 2015 - link
Very interesting, thank you.Hrobertgar - Tuesday, September 1, 2015 - link
Your spikes on the video recording appear to be every ~4 secs of video, could the CPU spikes be app / memory related?badchris - Tuesday, September 1, 2015 - link
Thank you for this excited article.And one problem,how do we explain 2 big core Snapdragon 808 is more efficient than 4 big core Snapdragon 810?Andrei Frumusanu - Tuesday, September 1, 2015 - link
You cannot make comparisons between different SoCs even if they have the same CPU IP and the same manufacturing process. The S808 is different from the S810 which are again different from Nvidia's X1 even if all 3 have A57 cores on TSMC 20nm.badchris - Tuesday, September 1, 2015 - link
nvm,i should realize this comparison is not scientific.metafor - Tuesday, September 1, 2015 - link
The S808 and S810 should be fairly similar though. That's not to say you can say that the only difference is the CPU configuration but a similar study on what the behavior is like on a different SoC with fewer cores would be helpful.Threading isn't 100% free and neither is thread migration. It might be good to take a look at just what the S810 is doing over time compared to the S808 in terms of CPU activity.
Andrei Frumusanu - Tuesday, September 1, 2015 - link
I have data on all of that... It's just in need of being published in an orderly fashion.kpkp - Tuesday, September 1, 2015 - link
There are quite few other differences beside the 2 cores, starting with the memory controller.badchris - Tuesday, September 1, 2015 - link
thx for your notice.there're something i forgotnpp - Tuesday, September 1, 2015 - link
As an ex-Android developer I can remember that the SDK not only encourages, but sometimes straight out enforces extensive usage of threads. For example, around API level 14/15, making a network request in the main thread would throw an exception, which may seem obvious to experienced developers but wasn't enforced in earlier versions. This is a simple example, but having the API itself pushing towards multi-threaded coding has a positive effect on the way Android developers build their apps. I'm not sure then why Google's own browser would be surprising for its usage of high thread counts - even a very basic app would be very likely to spawn much more than 4 threads nowadays.Arbie - Tuesday, September 1, 2015 - link
"I was weary of creating this table..."That's not surprising, after all your work ;-).
Terrific article BTW which is up to Anandtech's long-time standards. Seems like a mini master's thesis.
yankeeDDL - Tuesday, September 1, 2015 - link
Just wanted to say that it's agreat article. Well done and very interesting: the use of 4+4 cores on a mobile platform while on a PC we still have plenty of 2 cores CPUs, seemed quite ridiculous. But no, clearly, it makes sense.Tolwyns - Tuesday, September 1, 2015 - link
Very interesting article. These test were done on Android 5, I take it. I know that this analysis is geared toward current hardware, but most of the "4cores are only marketing" discussion was quite a while back when most device had some version of Android 4. I wonder if the benefits of more cores did show up then. The second thing i'm interested in is "How much of this is applicable to other SOCs". Not much I gather. And related to that "How much of this is limited to Samsung devices", because they made the CPU and the Firmware-softwarelayer of the tested device.SunLord - Tuesday, September 1, 2015 - link
I'm kinda curious how a 8 core version of the x20 with 2 lower power 4 mid power and 2 high power cores would performShadowmaster625 - Tuesday, September 1, 2015 - link
It is kind of a misleading analysis. One single haswell core could juggle all of these processes and still have plenty of time to sleep. So you're not really telling us anything here. Is a wider fatter core better than all these narrow underpowered cores? Given the performance and power consumption of the apple SoCs, I would still have to say yes.IanHagen - Tuesday, September 1, 2015 - link
This! When developing for iOS I usually have to span several threads (queues in Apple's world) for things that would otherwise block the main queue, which would cause the UI to "freeze" and the dual core SoC inside the devices I'm targeting are munching my threads absolutely fine. Just by saying that the several extre cores found in Android phones aren't sleeping you're not coming to any definitive conclusion about any clear advantage of having them.nightbringer57 - Tuesday, September 1, 2015 - link
The thing is that when you have 4 threads, 4 cores can potentially do the job more efficiently with performance equal to a single core with 4 times the execution speed.nightbringer57 - Tuesday, September 1, 2015 - link
*by efficiently, I mean, using less power*metafor - Tuesday, September 1, 2015 - link
Potentially, but not necessarily. Threading and thread migration aren't free. It depends on how much performance you really need. The A57(R3), for instance, at very low frequencies is actually slightly more power efficient than the A53 at its peak frequency (surprising, I know).If you have 4 threads that need absolutely-bare-minimum performance that a min-frequency single-core could handle, waking up 4 cores (even if they're smaller) and loading the code/data into the caches of each of those cores isn't necessarily a clear win. Especially if they share the same code.
lilmoe - Tuesday, September 1, 2015 - link
"The A57(R3), for instance, at very low frequencies is actually slightly more power efficient than the A53 at its peak frequency (surprising, I know)."Cool story. Except that, in most of the smaller multithreaded workload cases, the little cores usually aren't near their saturation levels. Also, in most cases, when they _do_ get saturated, the workload is transferred and dealt with by big core or two in short bursts.
Even if it isn't a "clear win", in *some* workloads mind you, saying that there isn't any apparent merit in these configurations is really irresponsible.
metafor - Tuesday, September 1, 2015 - link
I don't think I said there's no merit to such configurations. I simply said parallelizing a workload isn't always a clear win over using a single core. It depends on the required performance level and the efficiency curve of the small core and big core.If 4 threads running on 4 small cores at 50% FMax can be done by one big core at FMin without wasting any cycles, the advantage actually goes to the big core configuration. The small core configuration works if there's a thread that requires so little performance, it'd be wasteful to run it on the big core even at FMin.
The conclusion of which is best for the given workload isn't as clear cut as saying "look, the small cores are being used by a lot of threads!". But rather, by measuring power and perf using the two configurations.
lilmoe - Tuesday, September 1, 2015 - link
"If 4 threads running on 4 small cores at 50% FMax can be done by one big core at FMin without wasting any cycles, the advantage actually goes to the big core configuration."That's hardly a real-world or even valid comparison. Things aren't measured that way.
On the chip level, it all boils down to a direct comparison, which by itself isn't telling much because the core configuration of two different chips isn't usually the only difference. Other metrics start to kick in. Those arguing dual-core wide cores are thinking iOS, which by itself invalidates the comparison. We're talking Android here.
On the software side, real life scenarios aren't easy to quantify.
This article simply states the following:
- Android currently has relatively good parallelism capabilities for common workloads,
- Therefore, there is merit in 8 small core and 4x4 big.LITTLE configurations from an efficiency perspective. The latter being beneficial for comparable performance with custom core designs when needed.
Most users are either browsing, texting, or on social media. Most of the games played, BY FAR, are the less demanding ones that usually don't trigger the big cores.
I've said this before in reply to someone else. When QC and Samsung release their custom quad core designs, which do honestly believe would me more power efficient, those chips as-is? Or the same chips in addition to little cores in big.LITTLE (provided they can be properly configured that way).
A wise man once said: "efficiency is king".
lilmoe - Tuesday, September 1, 2015 - link
You guys are deliberately stretching the scope in which the findings of this article applies to.Just stop.
It has been clearly stated that this only applies to how Android (and Android Apps) manage to benefit from more cores in terms of efficiency. It was clearly stated that this doesn't apply to other operating systems "iOS in particular".
Nenad - Tuesday, September 1, 2015 - link
I agree.Especially since any app designed for performance will launch as many threads as needed to use available cores. So looking if "there are more than 4 threads active on 4+4 core CPU" can be misleading. If you run those tests on 2 core CPU, would number of threads remain same or be reduced? How about 10 core CPU?
In other words, only comparing performance and power usage (and not number of threads) would tell us if 4+4 is better than 4 or than 2 cores. Problem with that is finding different CPUs on same technology platform (to ensure only number of cores is different, and not 20nm vs 28nm vs different process etc).
Barring that, comparison of power performance per cost among 4+4 vs 4 vs 2 is also better indicator than comparing number of threads.
TD;DR: it is 'easy' to have more threads than CPU cores, but it does not indicate neither performance nor power usage.
ThisIsChrisKim - Tuesday, September 1, 2015 - link
The question being answered was this, "On the following pages we’ll have a look at about 20 different real-world often encountered use-cases where we monitor CPU frequency, power states and scheduler run-queues. What we are looking for specifically is the run-queue depth spikes for each scenario to see just how many threads are spawned during the various scenarios."It was simply assessing if multiple cores were actually used in the little-big design. Not a comparison of different designs.
Aenean144 - Tuesday, September 1, 2015 - link
Yes, that was what the article was trying to find out, but it didn't answer the question of whether 4-core and 8-core designs are better than 2 core or 3 core designs. That's been the contention of the "can't use that many cores" mantra.All this article has explained is that the OS scheduler can distribute threads across a lot of cores, something hardly anyone has a problem with.
What I'd like to see is the performance or user experience difference between 2-core, 4-core, and 8-core designs, all using the same SoC. There's nothing magic about this. In PCs today, we've largely settled on 2-core and 4-core designs for consumer systems. 6-core and 8-core systems for gaming rigs, but that's largely an artifact of Intel's SKUs.
So, if I believe the marketing, these smartphones really need to have 8-core designs when my laptop or desktop, capable of handling an order of magnitude more computation needs, with just 2-cores or 4-cores?
prisonerX - Tuesday, September 1, 2015 - link
You're missing the point. Your desktop is faster because it uses much more power. Mobile phones have more cores because it's more efficient to use more lower power cores than fewer high power cores.Aenean144 - Tuesday, September 1, 2015 - link
My desktop and laptop are power limited at their respective TDPs, and it's been this way for a very long time. If more cores were the answer, why are we sitting at mostly 2-core and 4-core CPUs in the PC space?All this stuff isn't new whatsoever, and the PC space went through the same core-count race 10 years ago. There has to be something systematically different such that Intel went down this path while ARM smartphones are in the midst of a core-count race.
I've read that it could be an economics thing as ARMH gets money on a per-core basis while spending money on complicated DVFS schemas and high IPC cores isn't worth it for them. Maybe in the PC space, we don't need the performance anymore.
mkozakewich - Wednesday, September 2, 2015 - link
It's because we're always fighting for quicker single-thread work. A lot of things can be parallelized, but there are also a lot of things today that aren't. I agree that Intel should try out some kind of big.LITTLE thing with a couple Atom cores and a Core M, just to see how it runs.prisonerX - Wednesday, September 2, 2015 - link
It's Intel's backward looking strategy. They're competing in high power/high single thread performance because they can win that with legacy desktop software and a legacy CPU architecture.Meanwhile, the rest of the world is going low power multithreaded, because that's the future. Going forward it's the only way to increase performance with low power. Google are correctly pushing an aggressively multithreaded software architecture.
Intel have already hit the wall with single threaded performance. They can win the present but not the future. Desktops aren't moving forward because no-one cares about them except gamers, and gamers largely don't care about power usage because they don't run on batteries.
Frihed - Friday, September 4, 2015 - link
In the desktop, the costs of making a chip matters much more, as the bigger chips are some times more expensive than a hole mobile device. The costs of putting more cores in the chip counts there.nightbringer57 - Tuesday, September 1, 2015 - link
There are several faces to this problem. The inherent issue with multi core designs is that it is not trivial to develop your application so it uses several cores efficiently. The potential gain in using multi core designs is that it can do the same job for less power than a single core design. The articles answers the question "can today's typical software environment use several cores efficiently?" with a pretty objective yes. It does not necessarily state that this is superior to more simple cores, it even states that it is not relevant for other environments.Your computer can handle an order of magnitude more computation needs, but don't forget it is using two orders of magnitude more power. The switch to dual core processors in the first place (in desktops) was motivated by the fact the industry hit a wall where they could not raise frequency anymore (heat and power consumption being an issue), where two, more efficient cores could increase performance significantly while still using less power.
Of course, the sweet spot does vary depending on the targeted power as well as the environment you're working in.
Do the android phones hit this sweet spot? maybe. But, at least, they are capable of hitting this sweet spot, for that power target and this given environment. That's what this article says.
name99 - Tuesday, September 1, 2015 - link
"Your computer can handle an order of magnitude more computation needs, but don't forget it is using two orders of magnitude more power."This is simply not true. Apple (and others) are shipping laptops running Broadwell at, what, 4.5W?
IF there were massive value in adding smaller cores to the Broadwell package (eg it could drop the average power to 2.5W) wouldn't Intel do that? They could, eg, add 2 or 4 Silvermont cores and have their own big.LITTLE system. They could even automatically switch threads in the HW if they don't want the OS to be involved, they way they handle DVFS automatically on their newest cores.
What I see when I read through these comments is a collection of people not very familiar with OS scheduling who are happy to interpret "OS can schedule multiple threads" as "app requires multiple cores to function well", and a much smaller collection of professionals who understand that the two have little relationship to each other for very short duration threads.
There's also a whole lot of claims being made here about power savings on the basis of absolutely fsckall evidence --- Andrei shows absolutely no graphs of the power being used during these runs, and it is HYPOTHESIS, not fact, that running say four lightweight threads on four A53 cores would use less energy than aggregating those four threads on a single A57. Maybe it's true, maybe it isn't --- I don't see any reason to simply assert that it's true.
nightbringer57 - Wednesday, September 2, 2015 - link
Well, when Intel ships Broadwell processors at 4.5W, they do consume only a little bit less than an order of magnitude more than your average cortex-A53 cluster.Using big.LITTLE configurations requires a lot of precautions at the very beginning of the conception of both cores. You don't just take lower-end cores and add them to the SoC. Both the higher end and the lower end core must be conceived as being big.LITTLE compatible.
And, however impressive those processors from Intel were, keep in mind that if they didn't put some lower power SKUs out, it's probably because they can't. To get into lower power figures, they are still forced to resort to Atom processors. And Atom branded processos today are... Overwhelmingly 4-cores models.
Once again, I'm not pretending this article is the final proof that 8 core designs are a must. But it shows that, at least, typical use cases are able to use all cores. Not that this is efficient. But there is potential for efficiency.
name99 - Wednesday, September 2, 2015 - link
You need to be a little more careful with slinging around the term "order of magnitude".A 4 core A53 cluster running FP on all CPUs (Exynos 5433, 1300MHz) uses about 865mW. That's a factor of 5 from Broadwell's 4.5W, not a factor of ten.
I'm no fan of much of Intel's work and behavior, but I don't think we are well served when we ignore details, when the details hold most of the interesting facts.
prisonerX - Wednesday, September 2, 2015 - link
All you're saying is that currently software needs better single thread performance. Duh!What everyone else is saying is that you can't get increased performance, nor better power usage, going forward with a single thread performance strategy. Physics has spoken!
It's nothing to do with OS scheduling and everything to do with software architecture. Everything is moving towards increasing parallelism and that will continue.
That's why mobile phones now have 8 cores and will have more because they are not weighed down by legacy architectures.
Jaybus - Tuesday, September 1, 2015 - link
It is worse than that on the development side. Yes, it is non-trivial to develop an app that uses multiple cores efficiently, but it is actually impossible to develop an app that uses multiple cores efficiently on all platforms. Maintaining many different versions optimized for particular platforms is just not plausible when there are so many different platforms.prisonerX - Tuesday, September 1, 2015 - link
What are you basing that on? Your own bias?In reality your single Haswell core is going to be slower and use a lot more power in the process.
lopri - Wednesday, September 2, 2015 - link
Do people read?"I should start with a disclaimer that because the tools required for such an analysis rely heavily on the Linux kernel, that this analysis is constrained to the behaviour of Android devices and doesn't necessarily represent the behaviour of devices on other operating systems, in particular Apple's iOS. As such, any comparisons between such SoCs should be limited to purely to theoretical scenarios where a given CPU configuration would be running Android."
The world does not revolve around Apple. This article has nothing whatsoever with Apple products. Furthermore, the article does neither claim nor imply that wider fat cores are better or worse than big.LITTLE.
jjj - Tuesday, September 1, 2015 - link
Yet to read the full article just looked at the more relevant graphs but i do wish you would have tested some heavier web pages (since AT and BBC are not that heavy), some SoCs with only small cores and look at power too. Would be very curious about power for a quad A53 vs octa A53 at same clocks. Testing GPGPU on the midrange SoCs that actually do that, would be interesting too.Really hope next year we get 2xA72(+some A53s) in 20$ SoCs for 150-200$ phones with very nice CPU perf. Anyway, will read the full article as soon as i find the free time.
Pissedoffyouth - Tuesday, September 1, 2015 - link
>Yet to read the full article just looked at the more relevant graphs but i do wish you would have tested some heavier web pagesDaily mail UK comes to mind as the heaviest website ever
jjj - Wednesday, September 2, 2015 - link
Fortune seems way heavy for example but even Amazon's home page (desktop version) seems not too friendly.djscrew - Tuesday, September 1, 2015 - link
Love the article, but after reading it, I feel like the articles you write comparing phone CPU performance & battery life are far more applicable. You lose access to so much of the information in this article that at the end of the day testing the actual phone & OS usage of the CPU makes more sense.Daniel Egger - Tuesday, September 1, 2015 - link
What I'm sincerely missing in this article is the differentiation between multi-processing and multi-threading, with the difference being that multi-processing is partitioning the workload across multiple processes whereas multi-threading spawns threads which are then run in the OS directly or again mapped to processes in different ways -- depending on the OS, in Linux they're actually mapped onto processes.Threads do share context with their creator so shared information requires locking which wastes performance and increases waiting times, the solution to which in the threading happy world is to throw more threads at a problem in the hopes that locking contention doesn't go through the roof and there's always enough work to do to keep the cores busy.So the optimum way to utilise resources to a maximum is actually not to use MT but MP for the heavy lifting and make sure that the heavy work is split evenly across the available number of to-be-utilised cores.
For me it would actually be interesting to know whether some apps are actually clever enough to do MP for the real work or are just stupidly creating threads (and also how many).
Since someone mentioned iOS: Actually if you're using queues this is not a traditional threading model but more akin to a MP model where different queues handled by workers (IMNSHO confusingly called thread) are used to dispatch work to in a usually lock free manner. Those workers (although they can be managed manually) are managed by the system and adjust automatically to the available resources to always deliver the best possible performance.
extide - Tuesday, September 1, 2015 - link
Don't forget, most of it is in Java, so it's probably one java process with several threads, not multiple java processes. The native apps, could go either way.Daniel Egger - Tuesday, September 1, 2015 - link
One interesting question here is: What does Google do? Chrome on regular desktop OS uses one process per view to properly isolate views from one another; does anybody know whether Chrome on Android does the same? I couldn't figure it out from the available documentation...praeses - Tuesday, September 1, 2015 - link
Next time can the colour legend below the graphs have their little squares enlarged to the height of the text? For those who are colour-challenged, it would make it a lot easier to match even when the image is blown-up. There doesn't seem to be a reason to have them so small.endrebjorsvik - Thursday, September 3, 2015 - link
I would rather make the colors more intuitive. For instance by using a colormap like the jet colormap from Octave/Matlab. Low clock frequencies should be mapped to cool colors (blue to green), while high clock frequencies should be mapped to warm colors (yellow to red). By doing that you just have to look at the legend only once. After that, the colors speak for themselves.The plots are really hard to read now when you have green at both low and high frequency (700 and 1400), and four shades of blue evenly distributed over the frequency range (500, 900, 1100, 1500). When I want to read such a plot, I don't care whether the frequency is 600 or 700. So these two colors doesn't have to be very different. But 500 and 1500 should be wastly different. The plots in this article are made in the opposite way. All the small steps has big color differences in order to be able to distinguish every small step from each other. But at some point the map ran out of majors colors and started repeating the spectrum again, with only slightly different colors.
qlum - Tuesday, September 1, 2015 - link
It would be interesting how desktop systems hold up in these tests especially with amd's 2 cores per module design.name99 - Tuesday, September 1, 2015 - link
Andrei,After so much work on your part it seems uncouth to complain! But this is the internet, so here goes...
If you ever have the energy to revise this topic, allow me to suggest two changes to substantially improve the value of the results:
With respect to how results are displayed:
- Might I suggest you change the stacking order of the Power State Distribution graphs so that we see Power Gated (ie the most power saving state) at the bottom, with Clock Gated (slightly less power saving) in the middle, and Active on top.
- The frequency distribution graphs make it really difficult to distinguish certain color pairs, and to see the big picture. Might I suggest that a plot using just grey scale (eg black at lowest frequency to white at highest frequency) would actually be easier to parse and to show the general structural pattern?
As a larger point, while this data is interesting in many ways, it doesn't (IMHO) answer the real question of interest. Knowing that there are frequently four runnable threads is NOT the same thing as knowing that four cores are useful, because it is quite possible that those threads are low priority, and that sliding them so as run consecutively rather than simultaneously would have no effect on perceived user performance.
The only way, I think, that one can REALLY answer this particular question ("are four cores valuable, and if so how") is an elimination study. (Alternatives like trying to figure out the average run duration of short term threads is really tough, especially given the granularity at which data is reported).
So the question is: does Android provide facilities for knocking out certain cores so that the scheduler just ignores them? If so, I can suggest a few very interesting experiments one might run to see the effects of certain knockout patterns. In each case, ideally, one would want to learn
- "throughput" style performance (how fast the system scores on various benchmarks)
- effect on battery usage
- "snappiness" (which is difficult to measure objectively, but maybe is obvious enough for subjective results to be noticed).
So, for example, what if we knock out all the .LITTLE cores? How much faster does the system seem to run, with what effect on battery? Likewise if we knockout all the big cores? What if we have just two big cores (vaguely equivalent to an iPhone 6)? What if we have two big and two LITTLE cores?
I don't have any axe to grind here --- I've no idea what these experiments will show. But it would certainly be interesting to know, for example, if a system consisting of only 4 big cores feels noticeably snappier than a big.LITTLE system while battery life is 95% as long? That might be a tradeoff many people are willing to make. Or, maybe it goes the other way --- a system with only one big core and 2 little cores feels just as fast as an octocore system, but the battery lasts 50% longer?
justinoes - Tuesday, September 1, 2015 - link
This was a seriously fascinating read. It points to a few things...First, Android has some serious ability to take advantage of multiple cores or ILP has improved dramatically. I remember when the Moto X (1st Gen) came out with a dual core CPU engineers at Moto said that even opening many websites didn't use more than two cores on most phones. [http://www.cnet.com/news/top-motorola-engineer-def...] Does this mean that Android has stepped up its game dramatically or was that information not true to begin with?
Second, It seems like there are two related components to the question that I have about multi-core performance. First, do extra cores get used? (You show that they do. Question answered.) Secondly, do extra cores matter from a performance perspective (if clock speed is compromised or otherwise)? (This is probably harder to answer because cores and clock are confounded - better CPU -> more cores, faster clock and complicated by the heterogeneous nature of these CPUs core setups.)
I suppose the second question could be (mostly) answered by taking a homogeneous core CPU and disabling a cores sequentially and looking at the changes in user experienced performance and power consumption. I'm sure some people will buy something with the maximum number of cores, but I'm just curious about whether it'll make a difference in real-world situations.
V900 - Tuesday, September 1, 2015 - link
A question that DOESNT get answered however is: Does the fact that all cores get used, contribute to a better/faster user experience?If there was only 2 or 4 cores present, would they complete the tasks just as fast?
In other words, Is there a gain from all 8 cores being used, or does all 8 cores get used just because they are there? (By low priority threads, which in a quad/dual core CPU would have been done sequentially, in just as fast a time?)
Since Apples dual core iPhones, always outperform Android quad and octa core phones, I would think that the latter is closer to the truth.
Read up on what some of the other posters here have written about low priority threads, and Microsofts research on the matter.
And ignore anyone who tries to over-interpret this article!
frenchy_2001 - Wednesday, September 2, 2015 - link
> Does the fact that all cores get used, contribute to a better/faster user experience?It does not, as long as you CPU can process all the threads in a timely manner.
It contributes to a lower power usage though, as power grows following the square of Voltage and voltage usage grows with frequency, while parallelization grows linearly.
Basically, if 2 A53 @ 800MHz can do the same amount of work as 1 A53 @ 1.6GHz, the 2 slower cores will do it for less power (refer to the perf/W curve on the conclusion page).
This was the goal of ARM when they designed big.LITTLE and this article shows that the S6 uses it correctly (by using small cores predominantly and keeping frequencies low). It is one more trick to deliver strong immediate computation, good perfs/W at moderate usage and great idle power while idling. I would not extrapolate beyond that as too many variables are in play (kernel/governor/HW/apps...)
name99 - Tuesday, September 1, 2015 - link
"When I started out this piece the goals I set out to reach was to either confirm or debunk on how useful homogeneous 8-core designs would be in the real world"You mean heterogeneous above rather than homogeneous.
Andrei Frumusanu - Tuesday, September 1, 2015 - link
No, I meant specifically 8x A53 SoCs.lilmoe - Tuesday, September 1, 2015 - link
I've been waiting for this piece since the GS6 came out. I can't even imagine the amount of time and work you've put into it. THANK YOU Andre.Now I hope we can put to rest the argument that Android would do better with only 2 high performance cores VS more core configurations. Google has been promising this for years and they're finally _starting_ to deliver. They're not there yet, lots of work needs to be done to exterminate all that ridiculous overhead (evident in the charts).
I'm also glad that it's finally evident that Chrome on Android VS SBrowser has significant impact on performance and battery life. It should only be fair to ask that Anandtech starts using the built-in browser for each respective device when benchmarking.
We're _just_ reaping the benefits of properly implemented big.LITTLE configurations, in both hardware and software, after 2 years of waiting. What's funny is that both Qualcomm and Samsung are moving away from these implementations back to Quad-core CPUs with Kryo and Mongoose respectively... I personally hope we get the best of both worlds in the form of Mediatek's 10 core big.LITTLE implementation, except the 2 high perf cores being either Kryo or Mongoose for their relatively insane single-threaded performance.
V900 - Tuesday, September 1, 2015 - link
You're coming up with conclusions that aren't aupported by the article.Can we put 2 vs 8 core argument to rest? Nope.
This test only shows, that when there are 4 (or 8) cores available, Android occasionally uses them all.
It says NOTHING about whether an 8 core CPU would be faster than with 2 wide cores. (Samsung and Qualcomm are moving towards Apple-like wide dual core designs. I doubt they'd do that, if 8 cores were really always faster/economical than 2)
In fact, the article doesn't really tell us whether 8 small cores are faster/more economical than 2 or 4 small cores. Keep in mind what people have brought up about the priority of threads. Some of the threads you see occupying all 8 cores, are low priority threads, that could just as quickly be completed in sequence if there were only 2 or 4 low power cores available.
lilmoe - Tuesday, September 1, 2015 - link
"Can we put 2 vs 8 core argument to rest? Nope."Are you sure we're on the same page here? We're talking efficiency, right?
"This test only shows, that when there are 4 (or 8) cores available, Android occasionally uses them all."
No it doesn't. Android is capable of utilizing all cores, yes, but it only allocates threads to the amount of cores *needed*, which is much, MUCH more power efficient than elevating a smaller number of high performance cores to their max performance/freq states.
"It says NOTHING about whether an 8 core CPU would be faster than with 2 wide cores. (Samsung and Qualcomm are moving towards Apple-like wide dual core designs. I doubt they'd do that, if 8 cores were really always faster/economical than 2)".
True, it doesn't show direct comparisons with modern wide cores running Android, because there isn't any. But even taking MT overhead and core switching overhead into account, I believe it's safe to say that things should be comparable (since the small cluster is rarely saturated), except (again) much more efficient. And no, QC and Samsung aren't moving to any dual core configuration; they're both moving to Quad-core configuration (ie: the most optimal for Android), which further proves the argument that more cores running at a lower frequency (and lower power draw) is more efficient than having fewer cores running at their relative max for MOBILE DEVICES.
The problem isn't the premise, it's the means. ARM's reference core designs aren't optimal in comparison to custom designs neither in performance nor in power consumption. Theoretically speaking, if Qualcomm or Samsung use little versions of their custom cores in 8-core configurations, or 4x4 big.LITTLE, we might theoretically see tremendous power savings in comparison. Again, this applies to Android based on this article.
"In fact, the article doesn't really tell us whether 8 small cores are faster/more economical than 2 or 4 small cores."
This article STRICTLY talks about the impact of 4x4 core big.LITTLE configuration has on ANDROID if you want BOTH performance and maximum efficiency. It clearly displays how Android (and its apps) is capable of dividing the load into multiple threads; therefore having more cores has its benefits. Also, you can clearly see that there is noticeable overhead here and there, and throwing more cores at the problem, running at lower frequency, is a better brute force solution to, AGAIN, maximize efficiency while maintaining high performance WHEN NEEDED, which is usually in relatively short bursts. Android still has ways with optimization, but its current incarnation proves that more cores are more efficient.
You are making the wrong comparisons here. What you should be asking for is comparisons between a quad-core A57 chip, VS an 8-core A53 chip, VS a 4x4 A57/A53 big.LITTLE chip. That, and only that would be a valid apples-to-apples comparison, which in this case is only valid when tested with Android. Unfortunately, good luck finding these chips from the same manufacturer built ont he same process...
lilmoe - Tuesday, September 1, 2015 - link
"they're both moving to Quad-core configuration (ie: the most optimal for Android)"In regard to large/wide core non-big.LITTLE designs that is.
lefty2 - Tuesday, September 1, 2015 - link
You are right. The article is deeply flawed. Nowhere is there any evidence of a 4 core device rendering a web page faster than a 2 core.lopri - Wednesday, September 2, 2015 - link
Qualcom's next custom core is 2+2 but Samsung's is 4+4. But I agree with the gist of your argument. Different core counts, but they all aim the same goal - performance and efficiency.rstuart - Tuesday, September 1, 2015 - link
Wow, excellent article. Colour me impressed that the developers use 4 cores effectively more times than not. It was not what I was expecting. Nor did I realise how much of the video processing task was offloaded to the GPU. In fact it's so good I suspect there will be more than a few electrical engineers poring over this in order to understand how well their software brethren make use of the hardware they provide.Filiprino - Tuesday, September 1, 2015 - link
Are you sure the Galaxy S6 employs the CFS scheduler? Should not it be the GTS scheduler?Andrei Frumusanu - Tuesday, September 1, 2015 - link
GTS is just an extension on top of CFS.Filiprino - Wednesday, September 2, 2015 - link
Well, yes. But it's not the same saying CFS or GTS. I think it should be noted that the phone is using GTS whose run queues work like in CFS.Andrei Frumusanu - Saturday, September 5, 2015 - link
GTS doesn't touch runqueues. GTS's modification to the CFS scheduler are relatively minor, it's still very much CFS at the core.AySz88 - Tuesday, September 1, 2015 - link
A technical note regarding "...scaling up higher in frequency has a quadratically detrimental effect on power efficiency as we need higher operating voltages..." - note that power consumption *already* goes up quadratically as voltage squared, BEFORE including the frequency change (i.e. P = k*f*v*v). So if you're also scaling up voltage while increasing frequency, you get a horrific blowing-up-in-your-face CUBIC relationship between power and frequency.ThreeDee912 - Tuesday, September 1, 2015 - link
Being in the Apple camp, I do know Apple also highly encourages developers to use multithreading as much as possible with their Grand Central Dispatch API, and has implemented things like App Nap and timer coalescing to help with the "race-to-idle" in OS X. I'm guessing Apple is likely taking this into account when designing their ARM CPUs as well. The thing is, unlike OS X, iOS and their A-series CPUs are mostly a black box of unknowns, other than whatever APIs they let developers use.jjj - Wednesday, September 2, 2015 - link
For web browsing i do wish you would look at heavier sites, worst case scenario since that's when the device stumbles and look at desktop versions.Would be nice to have a total run-queue depth graph normalized for core perf and clocks ( so converted in total perf expressed in w/e unit you like) to see what total perf would be needed (and mix of cores) with an ideal scheduler - pretty hard to do it in a reasonable way but it would be an interesting metric. After all the current total is a bit misleading by combining small and big , it shows facts but ppl can draw the wrong conclusions, like 4 cores is enough or 8 is not. Many commenters seem to jump to certain conclusions because of it too.
Would be nice to see the tests for each cluster with the other cluster shut down, ideally with perf and power measured too. Would help address some objections in the comments.
In the Hangouts launch test conclusion you say that more than 4 cores wouldn't be needed but that doesn't seem accurate outside the high end since if all the cores were small. assuming the small cores would be 2-3 times lower perf, then above 1.5 run-queue depth on the big cores might require more than 4 small cores if we had no big ones. Same goes for some other tests
A SoC with 3 types of cores , 2 of them big ,even bigger than A72 , and a bunch of medium and small does seem to make sense, with a proper scheduler and thermal management ofc. For midrange 2+4 should do and it wouldn't increase the cost too much vs 8 small ones, depending a bit on cache size - lets say on 16ff A53 bellow 0.5mm2 , A72 1.15mm2 and cache maybe close to 1.7 mm2 per 1MB. so a very rough approximation would be 2-3mm2 penalty depending if the dual A72 has 1 or 2MB L2. a lot more if the dual A72 forces them to add a second memory chan but even then it's worth the cost, 1-2$ more for the OEM would be worth it given the gain in single threaded perf and the marketing benefits
When looking at perf and battery in the real world multitasking is always present in some way. in benchmarks, never is. So why not try that too, something you encounter in daily usage. a couple of extra threads from other things should matter enough - maybe on Samsung devices you could test in split screen mode too, since it's something ppl do use and actually like.
For games it would be interesting to plot GPU clocks and power or temps as well as maybe FPS. Was expecting games to use the small cores more to allow for more TDP to go to the GPU and the games you tested do seem to do just that. Maybe you could look at a bunch of games from that perspective. Then again, it would be nice if AT would just start testing real games instead of synthetic nonsense that has minimal value and relevance
A look at image processing done on CPU+GPU would be interesting.
The way Android scales on many cores is encouraging for glasses where every cubic mm matters and batteries got to be tiny. Do hope the rumored Mercury core for wearables at 50-150mW is real and shows up soon.
Oh and i do support looking at how AT's battery of benchmarks is behaving but a better solution would be to transition away from synthetic, no idea why it takes so long in mobile when we had the PC precedent and nobody needs an explanation as to why synthetic benchmarks are far from ideal.
Anyway, great to see some effort in actually understanding mobile, as opposed to dubious synthetic benchmarks and empty assumptions that have been dominating this scene.AT should hire a few more people to help you out and increase the frequency of such articles since there are lots of things to explore and nobody is doing it.
tuxRoller - Wednesday, September 2, 2015 - link
Linux had largely been guided towards massively multiprocess workloads. If they didn't do this well then they wouldn't do anything well.The scheduler should be getting a lot better soon. It APPEARS that, after a long long long long time, things are moving forward on the combined scheduler (cfs), cpuidle, and cpufreq (dvfs) front. That's necessary in order to proper scheduling of tasks, especially across an aSMP soc.
One thing to keep in mind is that these oems often carry out of tree patches that they believe help their hardware. Often these patches are of, ahem, suspect quality, and pretty much always perform their task with some noticeable drawbacks. The upstream solution is (almost?) always "better".
Iow, things should only get better.
toyotabedzrock - Wednesday, September 2, 2015 - link
Is Chrome rendering pages it expects you to visit on the little cores?You should test with Chrome DEV as well.
toyotabedzrock - Wednesday, September 2, 2015 - link
This article author is partly color blind, reading the hangout app launch section makes it obvious he can't see the color difference of 1800 and 2100.LiverpoolFC5903 - Wednesday, September 2, 2015 - link
Great piece from AT as usual. Very interesting to say the least.I remember reading another good piece about multiple core usage in Android in one of the android themed websites (Androidcentral?). It was a much simpler analysis and the premise was to debunk the myth that more cores are pointless at best and counter productive at worst.
The conclusion from the tests were unequivocal. Android DOES make use of multiple cores, both via multi threaded programs and discrete system tasks. So core count DOES matter, at least to an extent.
name99 - Wednesday, September 2, 2015 - link
No-one is denying that "Android can make use of multiple cores".What they are denying (for this article and similar) is that the core are
- a useful way to save power OR
- a way to make the phone feel/behave faster
This article, you will notice, does not answer either of those issues. It doesn't touch power, and there've been plenty of comments about above (by myself and others) why what is being shown here has no relevance to the issue of performance.
Do you really want to insist on claiming that articles prove what they manifestly do not, and insist on ignore the (well-explained) concerns of engineering-minded people about the sorts of tests that ARE necessary to answer these questions? Wouldn't you rather be on the side of the engineers trying to understand these issues properly, than on the side of the fanboys, uninterested in the truth as long as you can shout your tribal slogans?
darkich - Friday, September 4, 2015 - link
You make no sense.Ever heard of the benchmarks?
If all cores are used (which this article proves as a fact), and if the benchmark shows the chip scoring higher than the chip with less cores - then yes, more cores means better performance.
It is a matter of that simple logic.
And the whole massive myth that this analysis dispelled was exactly the following - more cores is an useless gimmick BECAUSE ANDROID APPS CANNOT make use of them
Hannibal80 - Wednesday, September 2, 2015 - link
Wonderful articleBillBear - Wednesday, September 2, 2015 - link
Absolutely fascinating stuff. I was seriously not expecting to see Mediatek's ultra high core count strategy vindicated by real world measurements. That's the great thing about taking measurements instead of just speculating.As a follow up, it would be fascinating to see how selectively disabling different number of cores effects timed tests.
For instance, select an extremely CPU heavy web site like Forbes and see if allowing half as many cores makes rendering the home page take twice as long.
serendip - Wednesday, September 2, 2015 - link
Excellent article as usual by Andrei. As an owner of a phone with the MT6592 Mediatek 8-core A7 chip, I was also skeptical about the point of having so many small cores. I only got the phone because it was cheap :) I've seen all 8 cores spike to max frequency when loading complex web pages or playing games. For common tasks, only 2 or 4 cores are used. I've also found that down clocking it doesn't slow things down much and yields longer battery life; modifying the single core upfreq and additional core activation thresholds could be key to optimizing these chips to one's usual workload.Notmyusualid - Friday, August 26, 2016 - link
Good comment - I've been pondering this all morning, hence why I'm back on this article. Looking at an A9 Pro right here, 4.4 configuration.It seems that the low cores have a min freq of 400MHz, which I find interesting, as they seem to sit a 691MHz most of the time. Two of the big four sit in sleep, with the other two at 833MHz.
I wonder how adjustment to the larger cores may improve battery life.
krumme - Thursday, September 3, 2015 - link
Absolutely brilliant article that moved my pre undertanding.zodiacfml - Thursday, September 3, 2015 - link
Anandtech does it again. You are my entertainment and knowledge at the same time.My thoughts: Quite not surprising after seeing some benchmarks of some SoCs in one or two years since. I think the question here is performance versus more cores. More but smaller cores are best for efficiency and probably better marketing. The only problem with these smaller cores is performance which is why we often see them on cheaper devices and doesn't feel as fast. We still need more frequency for some big things and I believe a fast dual core will answer that. So, I can't wait to see the X20.
Gigaplex - Thursday, September 3, 2015 - link
An interesting and thorough analysis, although I'm concerned at some of the assumptions made in some of the conclusions. Just because a queue of 4 threads makes all the 8 big.LITTLE cores active doesn't mean that the architecture is effective. For all we know, the threads are thrashing back and forth, draining precious performance per watt.darkich - Friday, September 4, 2015 - link
Andrei, your articles are in a league of their own. Thanks for the great workmelgross - Thursday, September 10, 2015 - link
I'm still not convinced. The fact that it's doing what it does on these chips doesn't mean that their performance is as good as it could be, or that power efficiency is as good. We really need to see two to four core designs, with cores that are really more powerful, to make a proper comparison. We don't have that with the chips tested.blackcrayon - Thursday, October 8, 2015 - link
Exactly. It should at least show a design with a small number of powerful cores. Obviously with Apple's A series chips you have the issue of dealing with a different operating system underneath, but can't they use a Tegra K1 or something?Hydrargyrum - Friday, September 25, 2015 - link
The stacked frequency distribution graphs would be a *lot* easier to read if you used a consistent range of different saturations/intensities of a single colour (e.g. go from bright=fast to dark=slow), or a single pass from red to blue through the ROYGBIV colour spectrum (e.g. red=fast, blue=slow), to represent the range of frequencies.By going around the colour wheel multiple times in the colour coding it's *really* hard to tell whether a given area of the graph is high or low frequency. The difference in colour between 1400/800, 1296/700, and 1200/600 are very subtle to say the least.
Ethos Evoss - Thursday, November 12, 2015 - link
anandtech always uses weird non-popular words on its own site type ''heterogeneous '' never heard in my life and even usa or uk ppl have to search in cambridge/oxford dictionary :DDDImmediately u can say it is DEFO NOT USA or UK website.. They do not use such difficult words AT ALL :)
Ethos Evoss - Thursday, November 12, 2015 - link
ANd mainly they use when it comes to china products .. like mediatek or kirin or big.little topic etc..This site is DEVOURED or we could say powered by apple.inc :)