1) I wasn't aware that Microsoft released DirectX 9.3. Perhaps you meant 9.0c or 9.1? 2) Why is nVidia still using a single LPDDR2 channel when everyone else has gone to dual channel memory?
I do look forward to seeing what the next generation of GPUs will provide. Seems like we've stayed in this console generation too long with cell phones having graphics nearly on par with their 200W cousins.
It's actually more complex than that. When it comes to programming for Direct3D11, there are a number of different GPU feature level targets. The idea is that developers will write their application in DX11, and then have customized render backends to target each feature level they want to hit.
As it stands there are 6 feature levels: 11, 10_1, 10, 9_3, 9_2, and 9_1. Unfortunately everyone has been lax in their naming standards; DirectX and Direct3D often get thrown around interchangeably, as do periods and underscores in the feature levels (since prior to D3D 11, we'd simply refer to the version of D3D). This is how you end up with DirectX 9.3 and all permutations thereof. The article has been corrected to be more technically accurate to clear this up.
In any case, 9_1 is effectively identical to Direct3D 9.0. 9_3 is somewhere between D3D 9.0b and 9.0c; it implements a bunch of extra features like multiple render targets, but the shader language is 2.x (Vertex Shader 2.0a, Pixel Shader 2.0b) rather than 3.0
Microsoft made such a mess out of its DirectX nomenclature in the DX9 timeframe that the rest of the industry started to ignore it and invent their own. Hardly anybody even bothers to distinguish between Direct3D and DirectX anymore...they're used interchangeably, even though the former is a subset of the latter.
Windows 8 requires Shader Model 3.0 to be supported by the hardware. Whether you call that 10Level9_3 or 9_3, or DX9.3, or D3D9.3, who cares...from a graphics perspective, it is all just Shader Model 3.0 in the end, whatever you want to call it. All of the Windows 8 launch chipsets from nVidia, TI and Qualcomm, including this MSM8960 will all support Shader Model 3.0 as far as I can tell.
Feature level 9_3 isn't the same as Shader Model 3 support. The Qualcomm docs say DX9.3 though, which is quite confusing since it doesn't exist. That said, I agree with your assessment that it means Shader Model 3, and not feature level 9_3.
Although Scorpion featured a dual-channel LPDDR2 memory controller, in a PoP configuration only one channel was available to any stacked DRAM. In order to get access to both 32-bit memory channels the OEM had to implement a DRAM on-package as well as an external DRAM on the PCB. Memory requests could be interleaved between the two DRAM, however Qualcomm seemed to prefer load balancing between the two with CPU/GPU accesses being directed to the lower latency PoP DRAM. Very few OEMs seemed to populate both channels and thus Scorpion based designs were effectively single-channel offerings.
I can tell you with a modicum of confidence that this is true, at least partially.
Aren't you the same person who went on (ranting, obviously) about Krait using HKMG and hitting 2.5GHz next year, in another article?
I suggest you stop embarassing yourself. metafor knows what he's talking about, and you clearly don't. I read that previous thread - he was pretty much spot on for everything, as you would expect. I honestly don't know why he even bothers here given the reception he's getting...
Anyway unlike what the article says, the MSM8x60 indeed only has single-channel 32-bit LPDDR2. However there's a twist: Qualcomm offers it in a PoP (Package-on-Package) configuration at up to 266MHz or an 'ISM' (i.e. SiP or System-in-Package) at up to 333MHz. I wouldn't be surprised if many OEMs used the PoP for cost reasons.
I think the confusion might come from another (older) Qualcomm SoC working like the article described iirc, but this does not apply to the MSM8x60 AFAIK.
This information does come from Qualcomm, although the odd PoP + external DRAM configuration (that no one seems to use) basically means that MSM8x60 is a single-channel architecture (which is why I starred it in the table above). I will ask Qualcomm once more for confirmation that this applies to MSM8x60 as well as the older single core variants.
Scorpion does support dual-channel, however, the 8x60 series does not have two controllers. The 8x55/7x30 does, however and in most cases, are used in the configuration you described in the article.
I knew MSM7x30/8x55 was dual-channel but I thought it was also available as a 64-bit LPDDR2 PoP solution? While it makes sense for most people to use it as single-channel LPDDR2 as opposed to dual-channel LPDDR1 these days, why would anyone ever have used both PoP and non-PoP DRAM at the same time? Maybe that old leaked presentation on Baidu listing all the MSM7x30/8x55 packages is wrong though.
Hmm, that would certainly be news to me, it's possible but you'd still need a second memory controller and PHY so it makes very little sense. I can see a few possibilities: - The LPDDR2 and DDR2 subsystems aren't shared so in theory for tablets you could do 32-bit SiP LPDDR2+32-bit off-chip DDR2. Seems weird but not impossible. - You can do 32-bit ISM+32-bit PoP. Once again, why do this? Were they limited by package pins with a 0.4mm pitch? Seems unlikely with a 14x14 package but who knows. - You can genuinely do 32-bit PoP+32-bit on the PCB. Still seems really weird to me.
The MSM7200(A) had a separate small LPDDR1 chip (16-bit bus with SiP) reserved mostly for the baseband while the primary OS-accessible DRAM was off-chip. This was obviously rather expensive (fwiw Qualcomm only 'won' that generation on software and weak competition IMO) and removed it to reduce cost (making the chip's memory arbitrage more complex) on the MSM7227. I'm not sure about the QSD8650, maybe it still optionally had that extra memory bus (SiP-only) but it was more flexible and never used, it's hard to find that kind of info.
I suggest you look into the facts before passing such statements.
I don't know where you or the OP are getting your information from (3GHz A15's, quad 2.5GHz Kraits hitting next year, Kraits using HKMG etc.), but that's been pretty inaccurate. All you're doing is speculating based on bits and pieces floating around in PDF's and slides. I still remember one of his claims from the previous thread '2x A15's > 4xA9's' . While no one in their right sense of mind would argue that a the wider, deeper, single A15 is better than a single A9, to make such an uninformed, blanket statement (and to back it up with useless DMIPS numbers!) just doesn't bode very well.
ST-Ericsson has publicly indicated the A9600's A15s can run at up to 2.5GHz, and GlobalFoundries has publicly said that the A9600 uses their 28nm SLP process which uses High-K but not SiGe strain. Is it really hard to believe a 28HPM or 28HP A15 could easily reach 3GHz? I'm not sure anyone will do that in the phone/tablet market, but remember ARM also wants A15 to target slightly larger Windows 8 notebooks and (I'm not as optimistic about this) servers.
As for Krait, Qualcomm's initial PR mentioned 2.5GHz (not just random slides) and APQ8064 is on TSMC 28HPM which uses High-K. If you don't trust either me or metafor on that, Qualcomm has also publicly stated that most of their chips will run on SiON but that they were considering High-K for chips running at 2GHz or above: http://semimd.com/blog/2011/02/07/qualcomm-shies-a...
As for 2xA15 vs 4xA9, metafor's point is that most applications are still not sufficiently multithreaded. It has very little to do with DMIPS which is a worthless outdated benchmark (not that Coremark is perfect mind you - where oh where is my SPECInt for handhelds? Development platforms could support enough RAM to run it by now). Unlike him I think 4xA9 should be relatively competitive even if clearly inferior in some important cases, and as you imply it's a difficult and even fairly subjective topic, but I don't think metafor's opinion is unreasonable.
That is the point I'm trying to make! Semiconductor companies, by virtue of the fact that they have to sign OEM/ODM deals before they really even have working products almost always posture about how much their designs can go 'up to' or 'indicate' ratings and numbers. My beef with the earlier thread was that statements were being passed on as facts based purely on stuff posted in press releases. I can tell you, for a fact, that no 2.5GHz Krait (dual or quad) based product will be shipping in '12. I can also tell you for a fact that you will not see anything more than 1.8-2.2GHz (optimistic) in shipping A15's for mobile devices. I understand the A15 architecture is capable of much more, but to try and draw comparisons between a near-shipping mobile-spec quad-core A9 and an on-paper 3GHz A15 powering servers is not correct!
If you did follow the previous thread closely, you will see that this was the only point I was trying to get across, in vain. No matter how you slice and dice it, the 2xA15 > 4xA9 argument is wrong. This is very similar to what we're seeing in the x86 market with Intel and AMD where the older, tri and quad core AMD's are still able to keep-up with or beat dual-core Intel's in threaded situations. Now it is an entirely different argument as to whether or not Google/MS/whoever else makes effective use of multi-core CPU's in their current mobile platforms and their relatively crude/simple kernels (as compared to desktop operating systems), but come Windows 8, I am willing to bet that quad core (or multi-core in general) SoC's will prove their worth.
ST-E could underdeliver on the A9600, sure, but they've got a better process than OMAP5 and enough clever power saving tricks up their sleeve (some of which still aren't public) that I feel it's quite likely they won't. Remember 2.5GHz is only their peak frequency when a single core is on - they have not disclosed their throttling algorithms (which will certainly be more aggressive for everyone in the 28nm generation, especially on smartphone SKUs as opposed to tablets where higher TDPs are acceptable).
Also multiple companies will be making A15s on 28HPM eventually. TSMC has indicated they have a lot of interest in HPM, and that should certainly clock at least 25% higher than GF's Gate-First Non-SiGe 28SLP. However the problem is that the A15 is quite power hungry, so I expect people will use that frequency headroom to undervolt and reduce power although a few might expose it with a TurboBoost-like mechanism. On the other hand, exposing the full 3GHz for Windows 8 on ARM mini-notebooks should be a breeze, and I don't see why you'd expect that to be a problem.
As for 2.5GHz Quad-Core Krait in 2012 - I think they're still on schedule for tablets in late 2012, but then again NVIDIA was still on schedule for tablets in August 2011 back in February, so it's impossible to predict these things. Delays happen, and it'd be foolish not to take metafor seriously simply because he is unable to predict the unpredictable.
Finally, 2xA15 vs 4xA9... metafor's point is that given the lower maturity of multithreading on handheld devices, it's more like high-end quad-core Intel CPUs beating eight-core AMD CPUs in the real world. As I said I'm not sure I agree, but it's fairly reasonable.
I believe the comparison was simple: dual-Krait compared to 4xA9. I claimed Krait would be much closer to A15 level than A9 -- I was right.
I claimed that 2xA15 (and 2xKrait) will be far better than 4xA9. I hold to that but some may disagree. I can understand that point.
I claimed that both Krait and A15 were set to target similar frequencies (~2.5GHz) according to release -- I was right.
I claimed that Krait will initially be ~1.4-1.7GHz on 28LP and is planned to reach 2.5GHz on HKM -- I was right.
On every point, you disagreed with me -- and stated "I know for a fact that such and such". Did Krait turn out to be "a modified A9" as you claimed? No.
Is its projected performance and clockspeeds far closer to A15-class than A9? Yes.
Also, how often do you think that quad-core on your desktop actually gets utilized? Are you under the impression that multithreading is some kind of magical pixie dust that you sprinkle on to an OS kernel and all of a sudden, your applications will run faster?
Hint: Android is fully multithread capable -- 3.0 even includes a great pthread library implementation. That doesn't mean individual applications can actually be threaded or that they even can be. This should be common knowledge by now: only certain workloads are highly parallelizable.
On top of that -- as we've discussed previously -- there is a very small subset of computationally intensive, highly thread-scalable applications out there. Specifically: compression, video transcoding and image processing (which will likely be the biggest performance-demanding app for the CPU on tablets what with the Photoshop Touch series).
So yes, on 4xA9, that could potentially scale to all 4 cores. But here's the thing: those are all very NEON/FPU intensive applications.
And guess what subsystem was substantially improved in A15 compared to A9?
Double the data path width, unified load-store, fully out-of-order VFP + NEON and lower integer execution latency on top of that (which, IIRC, is what most image processing algorithms use).
Even assuming A15 runs at the same clockspeed as an A9, it would still be 2-3x faster in typical arithmetic-intensive workloads.
Anybody who thinks that application performance can be predicted by simply by CPU clock speeds alone, is a fool who has no business posting on sites like this. Let it go.
In the Power vs. Temperature plot on page two, have the axis labels been reversed accidentally?
The way I read the graph as it is, 40nm transistors can handle more power without getting hot, while 28nm transistors get hot very quickly with only a small increase in power.
It seems pretty clear. As temperature increases (right on the X axis), 40G transistors consume more power (up in the Y axis). The power increase vs temperature increase curve of 28LP doesn't grow as fast.
This, of course, has more to do with it being an LP process. 40LP transistors would have a similar curve.
Metafor is right about the curve having to do with the process. His explanation kinda makes it seem like a temp increase causes the power increase though. It's the power increase that causes the temp increase, and "G" transistors are designed to handle more power without wasted heat(temperature increase) compared to "LP" transistors. There's also a second reason why 28nm is hotter than 40nm.
If you have a certain amount of heat energy being produced at a certain power level, the 40nm transistors will be a certain temperature.
Now take that same amount of heat energy being produced, and shrink the transistors to half their size. This increases their temperature within the same power envelope.
Of course they labeled a thermal limit on the power side, because the holder of whatever phone this chip goes into is going to feel the heat coming from the chip due to how much power it's using(how much heat energy is put out), not just due to the temperature of the transistors
This is a problem in a lot of circuit design. Power dissipation (both due to scattering and increase in resistance of the charge channel) increases with temperature. But temperature also increases as more power is dissipated. It's a positive feedback loop that just gets hotter and hotter.
When simulating a circuit, this problem has to be taken into account but simulating the heat dissipation is difficult so one can never be sure that a circuit wouldn't overheat under its own operation.
It's an on-going research area in academics of how to simulate such a situation beforehand and avoid it.
Basically, it's increasing the power of the chip, which increases heat energy output, that increases the temperature. And with that increase in temperature, comes an increase in power.
Heat dissipation is the only way for the chip to keep itself from burning up. It's just impossible to really tell how much can be dissipated under even certain conditions due to heat exchange kinetically between atoms, and most likely the radiation amount differs between atoms.
It's basically impossible to simulate an exact scenario for this exchange.
The minute a company gives you a bit of attention,you forget about objectivity.
"The key is this: other than TI's OMAP 5 in the second half of 2012 and Qualcomm's Krait, no one else has announced plans to release a new microarchitecture in the near term" "Qualcomm remains the only active player in the smartphone/tablet space that uses its architecture license to put out custom designs."
Both statements are false,and you know that very well.
Yes, I have heard of Marvell and Armada, isn't that what's left of XScale? Honestly I thought they had given up on what was XScale and licensed the RTL like everyone else instead, but it looks like I was wrong.
Which is probably why Anand specified tablet/smartphones. Marvell is, for all practical purposes, not a major or even relevant player in tablet/smartphones.
It is worthy to note that both nVidia and (thus believed) Apple are utilizing their architectural licenses and are cooking up their own cores currently. But none will likely launch in 2012.
The qualification there was "in the smartphone/tablet space". Marvell hasn't had any significant design wins in the high end Android, iOS, Windows Phone or QNX OS space that we cover.
Neither of those options are custom designs using the ARM ISA, they are full IP licenses.
You are correct on ST-E's announcement though, I simply haven't been factoring them into discussions lately as they have been pretty much not present in the high-end smartphone space as of late.
I'm hearing that this one has slipped, and now the ST-Ericsson chipset with Rogue won't sample to OEMs until the first half of next year and therefore won't be in commercial products until the very end of 2012 if at all, whereas the MSM8960 is already sampling to OEMs according to Qualcomm. In other words, schedule-wise, you're probably comparing apples to oranges. I do agree with you though that we'll likely see A6 from Apple, in some form, by this time next year, but I think it'll be higher spec'd and will blow the doors off anything from STE. The more interesting question is whether the quad core 8064 that Qualcomm has mentioned for next year, can keep up with A6, both from a CPU and a GPU standpoint.
Marvell indeed hasn't had much luck in the high-end so far, but the same latest PJ4 core has been fairly successful both as the HSPA Pantheon 920 at BlackBerry (including some new BlackBerry 7 devices) and the TD-SCDMA Pantheon 910 for various China Mobile-specific phones. So I'm not sure it's a good idea to exclude them completely although they're certainly not in the same league as Qualcomm so I really don't blame you.
Are you really sure that the upcoming Nexus Prime will use OMAP 4? Seems unlikely to me... A SGX 540 to power a 720p display when Samsung as their own better SOC with Mali 400?? And the rival iPhone 4S use a GPU that is 7/8 times faster than SGX540. Sounds really stupid... I can't believe that OMAP 4 is the reference SOC for Android ICS
The SGX540 in the OMAP4460 is clocked significantly higher than (something TI is great at). So, while it isn't a powerhouse compared to Exynos or A5, it's more than sufficient. Google has never traditionally been top-of-the-line in terms of processors with their Nexus series. So it isn't out of the question they wouldn't be this time around either.
Technically, the Hummingbird SoC used in the Nexus S was top of the line at the time it was released (although it was eclipsed by dual-coe phones after 1-2 months). Also the Nexus One was the first major phone I can remember that had 512MB of RAM.
However, it should be said that Google doesn't think about getting the best SoCs available for their product, but instead seeks to get the best deal on it's components from bidding against vendors. It's very possible that Samsung knows they had the most powerful SoC outside of the A5 and wanted Google to pay it for the value it would provide. Google instead apparently went with TI, likely because TI is selling it's chips cheaper in order to be the reference platform for Ice Cream Sandwich.
Looking at the chart on "The Adreno 225 GPU" an comparing to the frame rates in the iPad 2 review (http://www.anandtech.com/show/4216/apple-ipad-2-gp... It looks like the PowerVR SGX543MP2 in the iPhone 4S will be about 33% faster. This is a very approximate estimate.
"Qualcomm claims that MSM8960 will be able to outperform Apple's A5 in GLBenchmark 2.x at qHD resolutions. We'll have to wait until we have shipping devices in hand to really put that claim to the test, but if true it's good news for Krait as the A5 continues to be the high end benchmark for mobile GPU performance."
Please correct your CPU features comparison table (and in the previous articles too). ARM11 has a *pipelined* VFP, which actually makes it a lot faster than Cortex-A8 for double precision floating point workloads. You can have a look at the instruction cycle timings to get a better idea: ARM11 VFP - http://infocenter.arm.com/help/topic/com.arm.doc.d... Cortex-A8 VFP - http://infocenter.arm.com/help/topic/com.arm.doc.d...
Static ram (the kind used in CPU caches) has always been much faster than the dynamic ram used for main system memory.
SRAM uses a block of a half dozen transistors to store a bit as a stable logic state; as a result it can operate as fast as any other transistor based device in an IC. The number of clock cycles a cache bank needs to complete an access operation is primarily a factor of its size, both because it takes more work to select a specific part and because signalling delays due to the speed of electric signals through the chip become significant at gigahertz speeds. Size isn't the only speed factor in how fast a cpu cache operators; higher associativity levels can improve worst case behavior (by reducing misses due to pathalogical memory access patterns) significantly at the cost of slowing all operations slightly.
DRAM has a significantly different design, it only uses a single transistor per bit and stores the data in a paired capacitor. This allows for much higher memory capacities in a single chip and much lower costs/GB as a result. The catch is that reading the capacitors charge level and then recharging it after the check takes significantly longer. The actual memory cells in a DDR3-1600 chip are only operating at 200mhz (up from 100-133mhz a decade ago); other parts of the chips operate much faster as they access large numbers of memory cells in parallel to keep the much faster memory bus fed.
Isn't it amazing how these low-power architectures are surpassing Atom in both power and performance? Atom isn't even an OoO architecture. Windows 8 and OS X Lion will be allow these architectures in netbooks and ultrabooks before we know it, and Intel's value-stripping at the low-end will finally die a terrible death.
Intel will be in very big trouble unless FinFet can get Atom's power down in the same sub 4W range as this next round of quad core chipsets from nVidia, Qualcomm and TI.
Even with FinFet it's impossible Atom will run at 4GHz which it needs to get comparable performance as an A15 or Krait at 2.5GHz. And in less than 2W. Atom has been dead in the water for a while now - it cannot keep up with ARM out-of-order cores on performance, power consumption or integration despite Intel's process advantage.
And in that case you don't really need a tablet. Tablets would be very useful if they could deliver some kind of alternative user input and display. For instance: holographic displays (like in sci-fi movies) and voice control and input. That would actually make these mobile devices truly useful. The current generation and the coming generations would be nothing more than mobile browser and mp3/mp4 player.
And, if it could work at all, how well would voice control work outside of an anechoic chamber (with a lone user talking to it)? But, I agree, tablets (used for decades in warehouses, by the way) will soft keyboards and the like are just a cheap and dirty way around the I/O issue.
What I mean is we'll get a quad core phone with 4GB of RAM and all we'll need is a tablet to dock it to. The tablet will have the other connectivity options-- HDMI, USB, etc which will let the non-gamers replace their desktops and the people not needing Windows for business, replace their laptops.
The dedicated gaming desktop will largely be obsolete in a few years. Games are much more GPU dependent than CPU dependent. What we will see is the era of external graphics cards which you can hook up to your laptop to play on your monitor @ 1080p with all the special effects turned up. Then you "undock" and rely on the integrated GPU to get longer battery life.
Thanks for the write up, pretty interesting stuff to come next year. Hopefully developers will embrace the new power, so that I can make fuller use of my Galaxy S2. I don't think I'll upgrade that soon though, pretty content with Galaxy S2 and it was expensive enough for now. I'll see what Christmas 2012 brings. :D
Does it mean all Krait cores will be manufactured in LP like the Companion CPU of Kal-EL for energy efficiency that's why it won't go beyond 2ghz even at 28mm?
I know this article's on the Krait architecture, but in your comparison of the 2011/2012 SoC roadmap, I think you left out one very promising competitor: the ST Ericsson Nova A9600.
Could this be included? And will there be any updates on this upcoming SoC?
On the whole, the 2012 roadmap looks really good. But I personally am looking forward to PowerVR's Series 6, the Cortex A15 MP and NVIDIA's Wayne the most.
I'm pretty sure they said it would sample by the end of this year and ship late 2012. If they were to delay it any more, they would be in serious trouble. ST Ericsson has quite a lot against them recently, and if they can't keep to their promises, TI is going to beat them quite badly. I'd estimate the Rogue to show up in an OMAP 5 in H2 2012 or H1 2013.
All in all I'm just really excited by the PowerVR Rogue. Seeing the specifications of the Nova A9600 and what the Rogue can do is quite amazing. It's almost on par with the PS3.
Could an article on that be done once information is available?
Metafor is right about the curve having to do with the process. His explanation kinda makes it seem like a temp increase causes the power increase though. It's the power increase that causes the temp increase, and "G" transistors are designed to handle more power without wasted heat(temperature increase) compared to "LP" transistors. There's also a second reason why 28nm is hotter than 40nm.
If you have a certain amount of heat energy being produced at a certain power level, the 40nm transistors will be a certain temperature.
Now take that same amount of heat energy being produced, and shrink the transistors to half their size. This increases their temperature within the same power envelope.
Of course they labeled a thermal limit on the power side, because the holder of whatever phone this chip goes into is going to feel the heat coming from the chip due to how much power it's using(how much heat energy is put out), not just due to the temperature of the transistors.
The graph is conceptually correct. While it's true that consuming more power produces more heat, the inverse is also true. The temperature of a transistor affects its leakage characteristics because resistance increases with heat. So at higher temperatures a CPU is going to consume more power to maintain its performance, compared to the same CPU at a lower temperature.
You're basically looking at the principles of a superconductor applied in reverse.
The number of MADs per 4 way SIMD is 4 not 8 as stated (plus 1 for scalar channel), so total flops per clock is (4+1) * 2 * 8 = 80 flops/clock or 16GFLOPs/s @ 200MHz and 24GFlops/s @ 300MHz.
According to this article we'll have to wait for at least another 3 years or maybe more until we get tablets with enough power and good battery life that would be actually useful. Yeah, maybe at 14nm and with tri-gate transistors somewhen in 2016 we'll be able to enjoy true mobile computing all day long (at least 16 hours without a recharge).
Yeah, progress is good but way to slow sometimes. Too bad I was hoping for a ultracool and powerful e-book reader that delivers more tablet like experience rather than what currently is available.
Define "useful". I'd argue that a lot of CPU cycles are wasted doing meaningless background tasks in apps that you can't see when it would be better to just pause and resume them later when the user brings them back into focus (aka Windows 8).
While Single instruction multiple data whether short or long vector is a great idea, sadly under utilized except in graphics processing, compressible signals and cryptography. Does the NEON technology just an additional graphics/compression engine? Does it require special Neon programming/compiling or does it enhance normal MIMD programming?
Excellent analysis. I guess x86 will have an uphill battle against ARM for years in bridging the gap in low power.
Just wondering what are your thoughts on the integrated WiFi/GPS/BT/FM in S4? Does it have potential to integrate away the currently separate combo-chips?
I'm a fan of efficient coding and design and hope that Qualcomm is following that path. I think there's been too much "just get it done no matter what" in the programming business for far too long. That's not to say there aren't very good free open source apps out there by astounding programmers but the mainstream seems to have forgotten or don't care.
This is good news because mobile is still in the early phases and if efficiency is priority things can only get better if not easier to debug, code, change, etc.
The bad is of course increasing speeds, any speeds like in the PC industry of yester-years. Sure we want more powerful hardware but lets not make it because of shotty code and design architectures. Again, in the mobile industry I believe these two points should be highly consider by anyone.
Nothing about Apple A6? (I guess reliable info is hard to come by, but surely it would be out by December 2010 and should be included in the comparison table?)
You could add the ZiiLabs ZMS-20 (and quad core ZMS-40) to the table. That'll probably see action in a Creative Android tablet/PMP and other OEMs might pick up the Jaguar reference tablet. (Could make a good review piece?)
Or how about Marvell's 628 ARMADA tri-core SoC? Marvell are getting pushed out by Qualcomm but it could get some design wins next year (It has two 1.5GHz cores and a low power 624 MHz core, similar to Tegra 3)
Wouldn't the use of L0 impact performance? If L1 is shut-down, there will be penalty on a L0 miss. Powering up L1 on a L0 miss would cost thousand of cycles at 1.5GHz. If L1 is on, then what is the point?
Hi, thanks for your great article. You may want to add to the list the Samsung Exynos 5250, that should be the A15 version of the Exynos SOC, running ARM Mali-T604 GPU, don't know if single or multicore. Let's hope it will be used in Samsung Galaxy S3 :)
Any thoughts on when we'll see Krait/Adreno3xx SoCs in the hands of consumers? Holiday 2012 or will it be later than that? Holiday 2012 would make it ideal for the next Nexus SoC.
Hey guys, Im a dev from xda forum recently new adreno drivers have been tested and improve 3d performance by 50%, so things are looking very good for the new gpu, itll perform much better than current devices
re shaky153. ''Hey guys, Im a dev from xda forum recently new adreno drivers have been tested and improve 3d performance by 50%, so things are looking very good for the new gpu, itll perform much better than current devices ''
Brilliant, so does that mean that the adreno gpus do have a lot of underutilised power as anand attested in this article?... if so will the adreno 220 get much closer to the a5 in benchmarks? (taking into consideration we dont know a5 gpu clocks and 2x tmu's)
Im not going to say what everyone else has already said, but I do want to comment on your knowledge of the topic. Youre truly well-informed. I cant believe how much of this <a href="http://www.luxurysarasotarealestate.com/west-of-tr... rel="follow"> West of Trail Real Estate in Sarasota </a>
The next time I read a blog, I hope that it doesn't disappoint me as much as this one. I mean, I know it was my choice to read, but I actually thought you have
reading it. Thank you for your information,i will be the regular reader of your blog and have a look of your website,thank you! <a href="http://www.themana.gr”rel=“follow”> customer experience </a>
you command get bought an nervousness over that you wish be delivering the following. unwell unquestionably come further formerly again since exactly the same nearly very often inside case you shield this<a href="http://cottageme.com/offers/all/canada”rel=“follow”> rent cottage </a>
I think that your viewpoint is deep, it’s just well thought out and truly incredible to see someone who knows how to put these thoughts so well. Good job!<a href="http://cottageme.com/offers/all/canada">re... cottage</a>
Good – I should definitely say I'm impressed with your blog. I had no trouble navigating through all the tabs as well as related info. The site ended up being truly simple to <a href="http://www.gadgetsboutique.com.au/iphone-ipod-acce... iphone case </a>
Really your publish is actually excellent and that i be thankful. You are writing perfectly that is amazing. I truly astounded by your publish regards.<a href="http://www.genesishealthinstitute.com">hor... therapy for men</a>
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
108 Comments
Back to Article
dagamer34 - Friday, October 7, 2011 - link
Great stuff to look forward to. Some comments:1) I wasn't aware that Microsoft released DirectX 9.3. Perhaps you meant 9.0c or 9.1?
2) Why is nVidia still using a single LPDDR2 channel when everyone else has gone to dual channel memory?
I do look forward to seeing what the next generation of GPUs will provide. Seems like we've stayed in this console generation too long with cell phones having graphics nearly on par with their 200W cousins.
A5 - Friday, October 7, 2011 - link
Re: DX 9.3, you beat me to it.Ilomilo is pretty, but it's not exactly Gears or Battlefield, you know?
Ryan Smith - Friday, October 7, 2011 - link
It's actually more complex than that. When it comes to programming for Direct3D11, there are a number of different GPU feature level targets. The idea is that developers will write their application in DX11, and then have customized render backends to target each feature level they want to hit.As it stands there are 6 feature levels: 11, 10_1, 10, 9_3, 9_2, and 9_1. Unfortunately everyone has been lax in their naming standards; DirectX and Direct3D often get thrown around interchangeably, as do periods and underscores in the feature levels (since prior to D3D 11, we'd simply refer to the version of D3D). This is how you end up with DirectX 9.3 and all permutations thereof. The article has been corrected to be more technically accurate to clear this up.
In any case, 9_1 is effectively identical to Direct3D 9.0. 9_3 is somewhere between D3D 9.0b and 9.0c; it implements a bunch of extra features like multiple render targets, but the shader language is 2.x (Vertex Shader 2.0a, Pixel Shader 2.0b) rather than 3.0
partylikeits1999 - Saturday, October 8, 2011 - link
Microsoft made such a mess out of its DirectX nomenclature in the DX9 timeframe that the rest of the industry started to ignore it and invent their own. Hardly anybody even bothers to distinguish between Direct3D and DirectX anymore...they're used interchangeably, even though the former is a subset of the latter.Windows 8 requires Shader Model 3.0 to be supported by the hardware. Whether you call that 10Level9_3 or 9_3, or DX9.3, or D3D9.3, who cares...from a graphics perspective, it is all just Shader Model 3.0 in the end, whatever you want to call it. All of the Windows 8 launch chipsets from nVidia, TI and Qualcomm, including this MSM8960 will all support Shader Model 3.0 as far as I can tell.
ET - Sunday, October 9, 2011 - link
Feature level 9_3 isn't the same as Shader Model 3 support. The Qualcomm docs say DX9.3 though, which is quite confusing since it doesn't exist. That said, I agree with your assessment that it means Shader Model 3, and not feature level 9_3.felixyang - Saturday, October 8, 2011 - link
2) I believe dual channels don't give any advantage due to tegra's system bus.metafor - Friday, October 7, 2011 - link
MSM8260 and MSM8660 only have single-channel 32-bit LP-DDR2 memory, not dual.z0mb13n3d - Friday, October 7, 2011 - link
Please read:Although Scorpion featured a dual-channel LPDDR2 memory controller, in a PoP configuration only one channel was available to any stacked DRAM. In order to get access to both 32-bit memory channels the OEM had to implement a DRAM on-package as well as an external DRAM on the PCB. Memory requests could be interleaved between the two DRAM, however Qualcomm seemed to prefer load balancing between the two with CPU/GPU accesses being directed to the lower latency PoP DRAM. Very few OEMs seemed to populate both channels and thus Scorpion based designs were effectively single-channel offerings.
I can tell you with a modicum of confidence that this is true, at least partially.
Aren't you the same person who went on (ranting, obviously) about Krait using HKMG and hitting 2.5GHz next year, in another article?
ArunDemeure - Friday, October 7, 2011 - link
I suggest you stop embarassing yourself. metafor knows what he's talking about, and you clearly don't. I read that previous thread - he was pretty much spot on for everything, as you would expect. I honestly don't know why he even bothers here given the reception he's getting...Anyway unlike what the article says, the MSM8x60 indeed only has single-channel 32-bit LPDDR2. However there's a twist: Qualcomm offers it in a PoP (Package-on-Package) configuration at up to 266MHz or an 'ISM' (i.e. SiP or System-in-Package) at up to 333MHz. I wouldn't be surprised if many OEMs used the PoP for cost reasons.
I think the confusion might come from another (older) Qualcomm SoC working like the article described iirc, but this does not apply to the MSM8x60 AFAIK.
Anand Lal Shimpi - Friday, October 7, 2011 - link
Arun,This information does come from Qualcomm, although the odd PoP + external DRAM configuration (that no one seems to use) basically means that MSM8x60 is a single-channel architecture (which is why I starred it in the table above). I will ask Qualcomm once more for confirmation that this applies to MSM8x60 as well as the older single core variants.
Take care,
Anand
metafor - Friday, October 7, 2011 - link
Scorpion does support dual-channel, however, the 8x60 series does not have two controllers. The 8x55/7x30 does, however and in most cases, are used in the configuration you described in the article.ArunDemeure - Friday, October 7, 2011 - link
I knew MSM7x30/8x55 was dual-channel but I thought it was also available as a 64-bit LPDDR2 PoP solution? While it makes sense for most people to use it as single-channel LPDDR2 as opposed to dual-channel LPDDR1 these days, why would anyone ever have used both PoP and non-PoP DRAM at the same time? Maybe that old leaked presentation on Baidu listing all the MSM7x30/8x55 packages is wrong though.metafor - Friday, October 7, 2011 - link
A single Scorpion and Adreno 205 just didn't need both channels. It makes more sense for a lot of OEM's to use single 32-bit LPDDR2.ArunDemeure - Friday, October 7, 2011 - link
Hmm, that would certainly be news to me, it's possible but you'd still need a second memory controller and PHY so it makes very little sense. I can see a few possibilities:- The LPDDR2 and DDR2 subsystems aren't shared so in theory for tablets you could do 32-bit SiP LPDDR2+32-bit off-chip DDR2. Seems weird but not impossible.
- You can do 32-bit ISM+32-bit PoP. Once again, why do this? Were they limited by package pins with a 0.4mm pitch? Seems unlikely with a 14x14 package but who knows.
- You can genuinely do 32-bit PoP+32-bit on the PCB. Still seems really weird to me.
The MSM7200(A) had a separate small LPDDR1 chip (16-bit bus with SiP) reserved mostly for the baseband while the primary OS-accessible DRAM was off-chip. This was obviously rather expensive (fwiw Qualcomm only 'won' that generation on software and weak competition IMO) and removed it to reduce cost (making the chip's memory arbitrage more complex) on the MSM7227. I'm not sure about the QSD8650, maybe it still optionally had that extra memory bus (SiP-only) but it was more flexible and never used, it's hard to find that kind of info.
Cheers,
Arun
mythun.chandra - Friday, October 7, 2011 - link
Anand,Isn't this what I had pinged you about earlier?
z0mb13n3d - Friday, October 7, 2011 - link
I suggest you look into the facts before passing such statements.I don't know where you or the OP are getting your information from (3GHz A15's, quad 2.5GHz Kraits hitting next year, Kraits using HKMG etc.), but that's been pretty inaccurate. All you're doing is speculating based on bits and pieces floating around in PDF's and slides. I still remember one of his claims from the previous thread '2x A15's > 4xA9's' . While no one in their right sense of mind would argue that a the wider, deeper, single A15 is better than a single A9, to make such an uninformed, blanket statement (and to back it up with useless DMIPS numbers!) just doesn't bode very well.
ArunDemeure - Friday, October 7, 2011 - link
ST-Ericsson has publicly indicated the A9600's A15s can run at up to 2.5GHz, and GlobalFoundries has publicly said that the A9600 uses their 28nm SLP process which uses High-K but not SiGe strain. Is it really hard to believe a 28HPM or 28HP A15 could easily reach 3GHz? I'm not sure anyone will do that in the phone/tablet market, but remember ARM also wants A15 to target slightly larger Windows 8 notebooks and (I'm not as optimistic about this) servers.As for Krait, Qualcomm's initial PR mentioned 2.5GHz (not just random slides) and APQ8064 is on TSMC 28HPM which uses High-K. If you don't trust either me or metafor on that, Qualcomm has also publicly stated that most of their chips will run on SiON but that they were considering High-K for chips running at 2GHz or above: http://semimd.com/blog/2011/02/07/qualcomm-shies-a...
As for 2xA15 vs 4xA9, metafor's point is that most applications are still not sufficiently multithreaded. It has very little to do with DMIPS which is a worthless outdated benchmark (not that Coremark is perfect mind you - where oh where is my SPECInt for handhelds? Development platforms could support enough RAM to run it by now). Unlike him I think 4xA9 should be relatively competitive even if clearly inferior in some important cases, and as you imply it's a difficult and even fairly subjective topic, but I don't think metafor's opinion is unreasonable.
z0mb13n3d - Friday, October 7, 2011 - link
That is the point I'm trying to make! Semiconductor companies, by virtue of the fact that they have to sign OEM/ODM deals before they really even have working products almost always posture about how much their designs can go 'up to' or 'indicate' ratings and numbers. My beef with the earlier thread was that statements were being passed on as facts based purely on stuff posted in press releases. I can tell you, for a fact, that no 2.5GHz Krait (dual or quad) based product will be shipping in '12. I can also tell you for a fact that you will not see anything more than 1.8-2.2GHz (optimistic) in shipping A15's for mobile devices. I understand the A15 architecture is capable of much more, but to try and draw comparisons between a near-shipping mobile-spec quad-core A9 and an on-paper 3GHz A15 powering servers is not correct!If you did follow the previous thread closely, you will see that this was the only point I was trying to get across, in vain. No matter how you slice and dice it, the 2xA15 > 4xA9 argument is wrong. This is very similar to what we're seeing in the x86 market with Intel and AMD where the older, tri and quad core AMD's are still able to keep-up with or beat dual-core Intel's in threaded situations. Now it is an entirely different argument as to whether or not Google/MS/whoever else makes effective use of multi-core CPU's in their current mobile platforms and their relatively crude/simple kernels (as compared to desktop operating systems), but come Windows 8, I am willing to bet that quad core (or multi-core in general) SoC's will prove their worth.
ArunDemeure - Friday, October 7, 2011 - link
ST-E could underdeliver on the A9600, sure, but they've got a better process than OMAP5 and enough clever power saving tricks up their sleeve (some of which still aren't public) that I feel it's quite likely they won't. Remember 2.5GHz is only their peak frequency when a single core is on - they have not disclosed their throttling algorithms (which will certainly be more aggressive for everyone in the 28nm generation, especially on smartphone SKUs as opposed to tablets where higher TDPs are acceptable).Also multiple companies will be making A15s on 28HPM eventually. TSMC has indicated they have a lot of interest in HPM, and that should certainly clock at least 25% higher than GF's Gate-First Non-SiGe 28SLP. However the problem is that the A15 is quite power hungry, so I expect people will use that frequency headroom to undervolt and reduce power although a few might expose it with a TurboBoost-like mechanism. On the other hand, exposing the full 3GHz for Windows 8 on ARM mini-notebooks should be a breeze, and I don't see why you'd expect that to be a problem.
As for 2.5GHz Quad-Core Krait in 2012 - I think they're still on schedule for tablets in late 2012, but then again NVIDIA was still on schedule for tablets in August 2011 back in February, so it's impossible to predict these things. Delays happen, and it'd be foolish not to take metafor seriously simply because he is unable to predict the unpredictable.
Finally, 2xA15 vs 4xA9... metafor's point is that given the lower maturity of multithreading on handheld devices, it's more like high-end quad-core Intel CPUs beating eight-core AMD CPUs in the real world. As I said I'm not sure I agree, but it's fairly reasonable.
dagamer34 - Saturday, October 8, 2011 - link
I doubt it was a delay as much as nVidia being boastful. They've quite known for that.metafor - Friday, October 7, 2011 - link
I believe the comparison was simple: dual-Krait compared to 4xA9. I claimed Krait would be much closer to A15 level than A9 -- I was right.I claimed that 2xA15 (and 2xKrait) will be far better than 4xA9. I hold to that but some may disagree. I can understand that point.
I claimed that both Krait and A15 were set to target similar frequencies (~2.5GHz) according to release -- I was right.
I claimed that Krait will initially be ~1.4-1.7GHz on 28LP and is planned to reach 2.5GHz on HKM -- I was right.
On every point, you disagreed with me -- and stated "I know for a fact that such and such". Did Krait turn out to be "a modified A9" as you claimed? No.
Is its projected performance and clockspeeds far closer to A15-class than A9? Yes.
Also, how often do you think that quad-core on your desktop actually gets utilized? Are you under the impression that multithreading is some kind of magical pixie dust that you sprinkle on to an OS kernel and all of a sudden, your applications will run faster?
Hint: Android is fully multithread capable -- 3.0 even includes a great pthread library implementation. That doesn't mean individual applications can actually be threaded or that they even can be. This should be common knowledge by now: only certain workloads are highly parallelizable.
FunBunny2 - Saturday, October 8, 2011 - link
-- This should be common knowledge by now: only certain workloads are highly parallelizable.Too many folks have never heard of Amdahl or his law.
metafor - Friday, October 7, 2011 - link
On top of that -- as we've discussed previously -- there is a very small subset of computationally intensive, highly thread-scalable applications out there. Specifically: compression, video transcoding and image processing (which will likely be the biggest performance-demanding app for the CPU on tablets what with the Photoshop Touch series).So yes, on 4xA9, that could potentially scale to all 4 cores. But here's the thing: those are all very NEON/FPU intensive applications.
And guess what subsystem was substantially improved in A15 compared to A9?
Double the data path width, unified load-store, fully out-of-order VFP + NEON and lower integer execution latency on top of that (which, IIRC, is what most image processing algorithms use).
Even assuming A15 runs at the same clockspeed as an A9, it would still be 2-3x faster in typical arithmetic-intensive workloads.
partylikeits1999 - Saturday, October 8, 2011 - link
Anybody who thinks that application performance can be predicted by simply by CPU clock speeds alone, is a fool who has no business posting on sites like this. Let it go.baritz - Friday, October 7, 2011 - link
In the Power vs. Temperature plot on page two, have the axis labels been reversed accidentally?The way I read the graph as it is, 40nm transistors can handle more power without getting hot, while 28nm transistors get hot very quickly with only a small increase in power.
metafor - Friday, October 7, 2011 - link
It seems pretty clear. As temperature increases (right on the X axis), 40G transistors consume more power (up in the Y axis). The power increase vs temperature increase curve of 28LP doesn't grow as fast.This, of course, has more to do with it being an LP process. 40LP transistors would have a similar curve.
Haserath - Saturday, October 8, 2011 - link
Metafor is right about the curve having to do with the process. His explanation kinda makes it seem like a temp increase causes the power increase though. It's the power increase that causes the temp increase, and "G" transistors are designed to handle more power without wasted heat(temperature increase) compared to "LP" transistors. There's also a second reason why 28nm is hotter than 40nm.If you have a certain amount of heat energy being produced at a certain power level, the 40nm transistors will be a certain temperature.
Now take that same amount of heat energy being produced, and shrink the transistors to half their size. This increases their temperature within the same power envelope.
Of course they labeled a thermal limit on the power side, because the holder of whatever phone this chip goes into is going to feel the heat coming from the chip due to how much power it's using(how much heat energy is put out), not just due to the temperature of the transistors
metafor - Saturday, October 8, 2011 - link
It's actually both :)This is a problem in a lot of circuit design. Power dissipation (both due to scattering and increase in resistance of the charge channel) increases with temperature. But temperature also increases as more power is dissipated. It's a positive feedback loop that just gets hotter and hotter.
When simulating a circuit, this problem has to be taken into account but simulating the heat dissipation is difficult so one can never be sure that a circuit wouldn't overheat under its own operation.
It's an on-going research area in academics of how to simulate such a situation beforehand and avoid it.
Haserath - Sunday, October 9, 2011 - link
Well, that is true.Basically, it's increasing the power of the chip, which increases heat energy output, that increases the temperature. And with that increase in temperature, comes an increase in power.
Heat dissipation is the only way for the chip to keep itself from burning up. It's just impossible to really tell how much can be dissipated under even certain conditions due to heat exchange kinetically between atoms, and most likely the radiation amount differs between atoms.
It's basically impossible to simulate an exact scenario for this exchange.
jjj - Friday, October 7, 2011 - link
The minute a company gives you a bit of attention,you forget about objectivity."The key is this: other than TI's OMAP 5 in the second half of 2012 and Qualcomm's Krait, no one else has announced plans to release a new microarchitecture in the near term"
"Qualcomm remains the only active player in the smartphone/tablet space that uses its architecture license to put out custom designs."
Both statements are false,and you know that very well.
introiboad - Friday, October 7, 2011 - link
Really? I wasn't aware of anyone else in the industry not using ARM's RTL and designing their cores from scratch.z0mb13n3d - Friday, October 7, 2011 - link
Well then, perhaps you haven't heard of Marvell and their Armada line of SoC's?introiboad - Friday, October 7, 2011 - link
Yes, I have heard of Marvell and Armada, isn't that what's left of XScale? Honestly I thought they had given up on what was XScale and licensed the RTL like everyone else instead, but it looks like I was wrong.metafor - Friday, October 7, 2011 - link
Which is probably why Anand specified tablet/smartphones. Marvell is, for all practical purposes, not a major or even relevant player in tablet/smartphones.It is worthy to note that both nVidia and (thus believed) Apple are utilizing their architectural licenses and are cooking up their own cores currently. But none will likely launch in 2012.
Anand Lal Shimpi - Friday, October 7, 2011 - link
The qualification there was "in the smartphone/tablet space". Marvell hasn't had any significant design wins in the high end Android, iOS, Windows Phone or QNX OS space that we cover.Is there another company you are referring to?
Take care,
Anand
Mike1111 - Friday, October 7, 2011 - link
What about the ST-Ericsson Nova A9600?http://www.stericsson.com/press_releases/NovaThor....
It's a 28nm dual-core Cortex-A15 (up to 2.5 GHz) with an Imagination Rogue GPU (Series 6, 210 GFLOPS). Taped out and set to ship in 2012:
http://www.eetimes.com/electronics-news/4226942/ST...
And I'm sure we will see an Apple A6 in the next 12 months (IMHO could be quite similar to the Nova A9600 in terms of CPU and GPU).
Anand Lal Shimpi - Friday, October 7, 2011 - link
Neither of those options are custom designs using the ARM ISA, they are full IP licenses.You are correct on ST-E's announcement though, I simply haven't been factoring them into discussions lately as they have been pretty much not present in the high-end smartphone space as of late.
Take care,
Anand
partylikeits1999 - Saturday, October 8, 2011 - link
I'm hearing that this one has slipped, and now the ST-Ericsson chipset with Rogue won't sample to OEMs until the first half of next year and therefore won't be in commercial products until the very end of 2012 if at all, whereas the MSM8960 is already sampling to OEMs according to Qualcomm. In other words, schedule-wise, you're probably comparing apples to oranges. I do agree with you though that we'll likely see A6 from Apple, in some form, by this time next year, but I think it'll be higher spec'd and will blow the doors off anything from STE. The more interesting question is whether the quad core 8064 that Qualcomm has mentioned for next year, can keep up with A6, both from a CPU and a GPU standpoint.ArunDemeure - Friday, October 7, 2011 - link
Marvell indeed hasn't had much luck in the high-end so far, but the same latest PJ4 core has been fairly successful both as the HSPA Pantheon 920 at BlackBerry (including some new BlackBerry 7 devices) and the TD-SCDMA Pantheon 910 for various China Mobile-specific phones. So I'm not sure it's a good idea to exclude them completely although they're certainly not in the same league as Qualcomm so I really don't blame you.macs - Friday, October 7, 2011 - link
Are you really sure that the upcoming Nexus Prime will use OMAP 4?Seems unlikely to me... A SGX 540 to power a 720p display when Samsung as their own better SOC with Mali 400?? And the rival iPhone 4S use a GPU that is 7/8 times faster than SGX540.
Sounds really stupid... I can't believe that OMAP 4 is the reference SOC for Android ICS
metafor - Friday, October 7, 2011 - link
The SGX540 in the OMAP4460 is clocked significantly higher than (something TI is great at). So, while it isn't a powerhouse compared to Exynos or A5, it's more than sufficient. Google has never traditionally been top-of-the-line in terms of processors with their Nexus series. So it isn't out of the question they wouldn't be this time around either.dagamer34 - Saturday, October 8, 2011 - link
Technically, the Hummingbird SoC used in the Nexus S was top of the line at the time it was released (although it was eclipsed by dual-coe phones after 1-2 months). Also the Nexus One was the first major phone I can remember that had 512MB of RAM.However, it should be said that Google doesn't think about getting the best SoCs available for their product, but instead seeks to get the best deal on it's components from bidding against vendors. It's very possible that Samsung knows they had the most powerful SoC outside of the A5 and wanted Google to pay it for the value it would provide. Google instead apparently went with TI, likely because TI is selling it's chips cheaper in order to be the reference platform for Ice Cream Sandwich.
DanD85 - Friday, October 7, 2011 - link
Funny I saw Adreno 225 having 8 SIMDS and 5 MADS per SIMDS that should be equal to 40 total MADS right? Why it's 80? Am I missing sth?cptcolo - Friday, October 7, 2011 - link
Looking at the chart on "The Adreno 225 GPU" an comparing to the frame rates in the iPad 2 review (http://www.anandtech.com/show/4216/apple-ipad-2-gp... It looks like the PowerVR SGX543MP2 in the iPhone 4S will be about 33% faster. This is a very approximate estimate.cptcolo - Friday, October 7, 2011 - link
"Qualcomm claims that MSM8960 will be able to outperform Apple's A5 in GLBenchmark 2.x at qHD resolutions. We'll have to wait until we have shipping devices in hand to really put that claim to the test, but if true it's good news for Krait as the A5 continues to be the high end benchmark for mobile GPU performance."ssvb - Friday, October 7, 2011 - link
Please correct your CPU features comparison table (and in the previous articles too). ARM11 has a *pipelined* VFP, which actually makes it a lot faster than Cortex-A8 for double precision floating point workloads. You can have a look at the instruction cycle timings to get a better idea:ARM11 VFP - http://infocenter.arm.com/help/topic/com.arm.doc.d...
Cortex-A8 VFP - http://infocenter.arm.com/help/topic/com.arm.doc.d...
Thanks.
Anand Lal Shimpi - Friday, October 7, 2011 - link
Thank you! Fixed :)Take care,
Anand
icrf - Friday, October 7, 2011 - link
An ARM Cortext A9 has an 8 stage pipeline, not 9: http://www.arm.com/files/pdf/ARMCortexA-9Processor...Anand Lal Shimpi - Friday, October 7, 2011 - link
Thanks :)Take care,
Anand
Blaster1618 - Friday, October 7, 2011 - link
Maybe a noob, but I did not know that L0 memory could operate at GHz clock rate (5-10 times that of SD Ram clock rate. Good stuff, keep it coming. B-)DanNeely - Friday, October 7, 2011 - link
Static ram (the kind used in CPU caches) has always been much faster than the dynamic ram used for main system memory.SRAM uses a block of a half dozen transistors to store a bit as a stable logic state; as a result it can operate as fast as any other transistor based device in an IC. The number of clock cycles a cache bank needs to complete an access operation is primarily a factor of its size, both because it takes more work to select a specific part and because signalling delays due to the speed of electric signals through the chip become significant at gigahertz speeds. Size isn't the only speed factor in how fast a cpu cache operators; higher associativity levels can improve worst case behavior (by reducing misses due to pathalogical memory access patterns) significantly at the cost of slowing all operations slightly.
DRAM has a significantly different design, it only uses a single transistor per bit and stores the data in a paired capacitor. This allows for much higher memory capacities in a single chip and much lower costs/GB as a result. The catch is that reading the capacitors charge level and then recharging it after the check takes significantly longer. The actual memory cells in a DDR3-1600 chip are only operating at 200mhz (up from 100-133mhz a decade ago); other parts of the chips operate much faster as they access large numbers of memory cells in parallel to keep the much faster memory bus fed.
Blaster1618 - Saturday, October 8, 2011 - link
Thank you for such a clear and thorough response.MonkeyPaw - Friday, October 7, 2011 - link
Isn't it amazing how these low-power architectures are surpassing Atom in both power and performance? Atom isn't even an OoO architecture. Windows 8 and OS X Lion will be allow these architectures in netbooks and ultrabooks before we know it, and Intel's value-stripping at the low-end will finally die a terrible death.partylikeits1999 - Saturday, October 8, 2011 - link
Intel will be in very big trouble unless FinFet can get Atom's power down in the same sub 4W range as this next round of quad core chipsets from nVidia, Qualcomm and TI.Wilco1 - Saturday, October 8, 2011 - link
Even with FinFet it's impossible Atom will run at 4GHz which it needs to get comparable performance as an A15 or Krait at 2.5GHz. And in less than 2W. Atom has been dead in the water for a while now - it cannot keep up with ARM out-of-order cores on performance, power consumption or integration despite Intel's process advantage.Tomasthanes - Friday, October 7, 2011 - link
Yes, I could go to Google. It's just better journalism to define acronyms (even common ones) as you use them.Baron Fel - Friday, October 7, 2011 - link
system on a chip. At this point for Anand it would be like writing CPU as central processing unit.bjacobson - Saturday, October 8, 2011 - link
silly me been saying silicon on chip in my head all these years never stopped to think it through >.<bjacobson - Friday, October 7, 2011 - link
very exciting. Soon I won't have any need of a dedicated desktop except for gaming or a laptop except for business.Zingam - Saturday, October 8, 2011 - link
So you basically need a desktop and a laptop?Zingam - Saturday, October 8, 2011 - link
And in that case you don't really need a tablet. Tablets would be very useful if they could deliver some kind of alternative user input and display. For instance: holographic displays (like in sci-fi movies) and voice control and input. That would actually make these mobile devices truly useful. The current generation and the coming generations would be nothing more than mobile browser and mp3/mp4 player.FunBunny2 - Saturday, October 8, 2011 - link
And, if it could work at all, how well would voice control work outside of an anechoic chamber (with a lone user talking to it)? But, I agree, tablets (used for decades in warehouses, by the way) will soft keyboards and the like are just a cheap and dirty way around the I/O issue.bjacobson - Saturday, October 8, 2011 - link
wow not sure what happened there.What I mean is we'll get a quad core phone with 4GB of RAM and all we'll need is a tablet to dock it to. The tablet will have the other connectivity options-- HDMI, USB, etc which will let the non-gamers replace their desktops and the people not needing Windows for business, replace their laptops.
dagamer34 - Saturday, October 8, 2011 - link
The dedicated gaming desktop will largely be obsolete in a few years. Games are much more GPU dependent than CPU dependent. What we will see is the era of external graphics cards which you can hook up to your laptop to play on your monitor @ 1080p with all the special effects turned up. Then you "undock" and rely on the integrated GPU to get longer battery life.Best of both worlds!
Death666Angel - Friday, October 7, 2011 - link
Thanks for the write up, pretty interesting stuff to come next year. Hopefully developers will embrace the new power, so that I can make fuller use of my Galaxy S2. I don't think I'll upgrade that soon though, pretty content with Galaxy S2 and it was expensive enough for now. I'll see what Christmas 2012 brings. :Dcnxsoft - Friday, October 7, 2011 - link
Back in February, Qualcomm also announced the quad-core Snapdragon APQ8064 would be ready in 2012. Any news on that ?HighTech4US - Friday, October 7, 2011 - link
Nvidia's roadmap clearly shows Kal-El+ as mid 2012 and 28nm Wayne at the end of 2012http://www.androidcentral.com/nvidias-tegra-roadma...
BoyBawang - Saturday, October 8, 2011 - link
Does it mean all Krait cores will be manufactured in LP like the Companion CPU of Kal-EL for energy efficiency that's why it won't go beyond 2ghz even at 28mm?skydrome1 - Saturday, October 8, 2011 - link
I know this article's on the Krait architecture, but in your comparison of the 2011/2012 SoC roadmap, I think you left out one very promising competitor: the ST Ericsson Nova A9600.Could this be included? And will there be any updates on this upcoming SoC?
On the whole, the 2012 roadmap looks really good. But I personally am looking forward to PowerVR's Series 6, the Cortex A15 MP and NVIDIA's Wayne the most.
Zingam - Saturday, October 8, 2011 - link
Hollyday season 2013skydrome1 - Monday, October 10, 2011 - link
I'm pretty sure they said it would sample by the end of this year and ship late 2012. If they were to delay it any more, they would be in serious trouble. ST Ericsson has quite a lot against them recently, and if they can't keep to their promises, TI is going to beat them quite badly. I'd estimate the Rogue to show up in an OMAP 5 in H2 2012 or H1 2013.All in all I'm just really excited by the PowerVR Rogue. Seeing the specifications of the Nova A9600 and what the Rogue can do is quite amazing. It's almost on par with the PS3.
Could an article on that be done once information is available?
I would love to have a portable gaming console :)
Haserath - Saturday, October 8, 2011 - link
Metafor is right about the curve having to do with the process. His explanation kinda makes it seem like a temp increase causes the power increase though. It's the power increase that causes the temp increase, and "G" transistors are designed to handle more power without wasted heat(temperature increase) compared to "LP" transistors. There's also a second reason why 28nm is hotter than 40nm.If you have a certain amount of heat energy being produced at a certain power level, the 40nm transistors will be a certain temperature.
Now take that same amount of heat energy being produced, and shrink the transistors to half their size. This increases their temperature within the same power envelope.
Of course they labeled a thermal limit on the power side, because the holder of whatever phone this chip goes into is going to feel the heat coming from the chip due to how much power it's using(how much heat energy is put out), not just due to the temperature of the transistors.
ViRGE - Saturday, October 8, 2011 - link
The graph is conceptually correct. While it's true that consuming more power produces more heat, the inverse is also true. The temperature of a transistor affects its leakage characteristics because resistance increases with heat. So at higher temperatures a CPU is going to consume more power to maintain its performance, compared to the same CPU at a lower temperature.You're basically looking at the principles of a superconductor applied in reverse.
JohnWH - Saturday, October 8, 2011 - link
The number of MADs per 4 way SIMD is 4 not 8 as stated (plus 1 for scalar channel), so total flops per clock is (4+1) * 2 * 8 = 80 flops/clock or 16GFLOPs/s @ 200MHz and 24GFlops/s @ 300MHz.Zingam - Saturday, October 8, 2011 - link
According to this article we'll have to wait for at least another 3 years or maybe more until we get tablets with enough power and good battery life that would be actually useful.Yeah, maybe at 14nm and with tri-gate transistors somewhen in 2016 we'll be able to enjoy true mobile computing all day long (at least 16 hours without a recharge).
Yeah, progress is good but way to slow sometimes. Too bad I was hoping for a ultracool and powerful e-book reader that delivers more tablet like experience rather than what currently is available.
dagamer34 - Saturday, October 8, 2011 - link
Define "useful". I'd argue that a lot of CPU cycles are wasted doing meaningless background tasks in apps that you can't see when it would be better to just pause and resume them later when the user brings them back into focus (aka Windows 8).bengildenstein - Saturday, October 8, 2011 - link
I'm trying to post a relevant comment but it's being flagged as spam. Can anyone offer any insight into why this may be the case?bengildenstein - Saturday, October 8, 2011 - link
The post centers around a siggraph 2011 talk that touches on Adreno 205's fragment shader performance.The gist is that the Adreno 205 (xperia play) showed faster performance with complex shaders than the SGX543MP2 (ipad2).
It seems I cannot post a link to the paper, but you can find it titled "Fast Mobile Shaders" at: aras-p [dot] info
Ryan Smith - Saturday, October 8, 2011 - link
The spam filter is pretty aggressive against links.http://www.aras-p.info/texts/files/FastMobileShade...
s44 - Saturday, October 8, 2011 - link
You sure it's the same Mali? I couldn't find it specified in any of Samsung's press releases.Blaster1618 - Saturday, October 8, 2011 - link
While Single instruction multiple data whether short or long vector is a great idea, sadly under utilized except in graphics processing, compressible signals and cryptography. Does the NEON technology just an additional graphics/compression engine? Does it require special Neon programming/compiling or does it enhance normal MIMD programming?happy medium - Sunday, October 9, 2011 - link
I thought the new tegra 3 was 5 cores?ET - Monday, October 10, 2011 - link
Thanks to Anandtech for covering mobile chips. I find it pretty exciting to read about these low power combinations of CPU and GPU.tech360 - Monday, October 10, 2011 - link
Excellent analysis. I guess x86 will have an uphill battle against ARM for years in bridging the gap in low power.Just wondering what are your thoughts on the integrated WiFi/GPS/BT/FM in S4? Does it have potential to integrate away the currently separate combo-chips?
Thanks.
The0ne - Monday, October 10, 2011 - link
I'm a fan of efficient coding and design and hope that Qualcomm is following that path. I think there's been too much "just get it done no matter what" in the programming business for far too long. That's not to say there aren't very good free open source apps out there by astounding programmers but the mainstream seems to have forgotten or don't care.This is good news because mobile is still in the early phases and if efficiency is priority things can only get better if not easier to debug, code, change, etc.
The bad is of course increasing speeds, any speeds like in the PC industry of yester-years. Sure we want more powerful hardware but lets not make it because of shotty code and design architectures. Again, in the mobile industry I believe these two points should be highly consider by anyone.
broccauley - Monday, October 10, 2011 - link
You need to add the ST-Ericsson Thor / NovaThor series of SoCs to your table.Also, of the SoCs in your table only the Qualcomm ones are true telecom SoCs - the others are mere application engines without telecom features.
ssiu - Tuesday, October 11, 2011 - link
Nothing about Apple A6? (I guess reliable info is hard to come by, but surely it would be out by December 2010 and should be included in the comparison table?)ssiu - Tuesday, October 11, 2011 - link
... December 2012 (where is the Edit button ...)sarge78 - Wednesday, October 12, 2011 - link
You could add the ZiiLabs ZMS-20 (and quad core ZMS-40) to the table. That'll probably see action in a Creative Android tablet/PMP and other OEMs might pick up the Jaguar reference tablet. (Could make a good review piece?)http://www.ziilabs.com/products/processors/zms20.a...
Or how about Marvell's 628 ARMADA tri-core SoC? Marvell are getting pushed out by Qualcomm but it could get some design wins next year (It has two 1.5GHz cores and a low power 624 MHz core, similar to Tegra 3)
http://www.marvell.com/company/news/pressDetail.do...
lancedal - Thursday, October 13, 2011 - link
Wouldn't the use of L0 impact performance?If L1 is shut-down, there will be penalty on a L0 miss. Powering up L1 on a L0 miss would cost thousand of cycles at 1.5GHz.
If L1 is on, then what is the point?
Hunt3rj2 - Saturday, October 22, 2011 - link
Hopefully Qualcomm amps up their SoC to 2 GHz levels to stay competitive with Cortex A15.Qualcomm has a track record of starting strong, but then they just sputter out and can't keep up with the reference ARM Cortex SoCs.
SydneyBlue120d - Sunday, October 23, 2011 - link
Hi, thanks for your great article.You may want to add to the list the Samsung Exynos 5250, that should be the A15 version of the Exynos SOC, running ARM Mali-T604 GPU, don't know if single or multicore. Let's hope it will be used in Samsung Galaxy S3 :)
ilkhan - Sunday, November 6, 2011 - link
Any thoughts on when we'll see Krait/Adreno3xx SoCs in the hands of consumers? Holiday 2012 or will it be later than that?Holiday 2012 would make it ideal for the next Nexus SoC.
shaky153 - Monday, December 12, 2011 - link
Hey guys, Im a dev from xda forumrecently new adreno drivers have been tested and improve 3d performance by 50%, so things are looking very good for the new gpu, itll perform much better than current devices
french toast - Monday, December 12, 2011 - link
re shaky153.''Hey guys, Im a dev from xda forum
recently new adreno drivers have been tested and improve 3d performance by 50%, so things are looking very good for the new gpu, itll perform much better than current devices ''
Brilliant, so does that mean that the adreno gpus do have a lot of underutilised power as anand attested in this article?... if so will the adreno 220 get much closer to the a5 in benchmarks? (taking into consideration we dont know a5 gpu clocks and 2x tmu's)
A reply would be appreciated.
french toast - Monday, December 12, 2011 - link
and half the bandwith....sohaib2649 - Monday, December 19, 2011 - link
command get bought an nervousness http://www.yachtbooker.com/Yacht-Charter-croatia.h...over that you wish be delivering the following. unwell unquestionably come further formerly again since exactly the same nearly very often inside case you shield this<a href="http://www.yachtbooker.com/Yacht-Charter-croatia.h...”rel=“follow”> yacht charter Croatia </a>
sohaib2649 - Tuesday, December 27, 2011 - link
Im not going to say what everyone else has already said, but I do want to comment on your knowledge of the topic. Youre truly well-informed. I cant believe how much of this <a href="http://www.luxurysarasotarealestate.com/west-of-tr... rel="follow"> West of Trail Real Estate in Sarasota </a>sohaib2649 - Wednesday, December 28, 2011 - link
The next time I read a blog, I hope that it doesn't disappoint me as much as this one. I mean, I know it was my choice to read, but I actually thought you havesohaib2649 - Sunday, January 15, 2012 - link
reading it. Thank you for your information,i will be the regular reader of your blog and have a look of your website,thank you! <a href="http://www.themana.gr”rel=“follow”> customer experience </a>sohaib2649 - Wednesday, January 25, 2012 - link
you command get bought an nervousness over that you wish be delivering the following. unwell unquestionably come further formerly again since exactly the same nearly very often inside case you shield this<a href="http://cottageme.com/offers/all/canada”rel=“follow”> rent cottage </a>sohaib2649 - Wednesday, January 25, 2012 - link
I think that your viewpoint is deep, it’s just well thought out and truly incredible to see someone who knows how to put these thoughts so well. Good job!<a href="http://cottageme.com/offers/all/canada">re... cottage</a>sohaib2649 - Thursday, January 26, 2012 - link
Good – I should definitely say I'm impressed with your blog. I had no trouble navigating through all the tabs as well as related info. The site ended up being truly simple to <a href="http://www.gadgetsboutique.com.au/iphone-ipod-acce... iphone case </a>shriganesh - Thursday, February 23, 2012 - link
http://www.qualcomm.com/media/documents/snapdragon...Henri - Saturday, March 31, 2012 - link
http://www.vir.com.vn/news/tech/option-announces-n...ashbcl60 - Wednesday, June 27, 2012 - link
Really your publish is actually excellent and that i be thankful. You are writing perfectly that is amazing. I truly astounded by your publish regards.<a href="http://www.genesishealthinstitute.com">hor... therapy for men</a>suzi002 - Tuesday, September 4, 2012 - link
This has longer duration compared to others.http://medicalbillingandcodingjobs.biz
Leandro1123 - Wednesday, January 4, 2017 - link
Hi. Could I about the source of Krait's specific cache features, like L1 size or associativity?I looked in Qualcomm's website, but I found nothing.
Thank you in advance.