Radeon 5970 Overclocking: The VRM Temperature Bottleneck
by Ryan Smith on November 25, 2009 12:00 AM EST- Posted in
- Ryan's Ramblings
In our Radeon HD 5970 review, we ran in to some issues when trying to overclock the card to 5870 speeds of 850MHz/1200MHz. At the time this is something we attributed to the VRMs, meanwhile AMD suggested that it was cooling related, and that we should manually increase the fan speed.
As it turns out, we were both right, we just didn’t have the tools at the time to properly identify and isolate the issue. Late last week we got our hands on a beta version of Everest Ultimate, which added preliminary support for the 5970. With that, we could read and log the voltages and temperatures of the various components of the 5970, and properly isolate the issue.
From that, we’ve discovered a few interesting things about the 5970. Let’s start things off with the cooler removed from the 5970.
We’ve gone ahead and circled the VRMs in red. There are 9 altogether; 6 on the right side, and 3 near the left side of the card. We aren’t able to track down what each specific VRM is connected to, but we believe that each GPU is attached to 3, each GPU’s RAM is attached to 1, and finally the PLX PCIe bridge is attached to 1. Regardless, pay attention to the location of these VRMs for later discussion.
As we previously noted in our 5970 review, when overclocked the card was throttling down in two cases. One was when running OCCT/FurMark, members of AMD’s “power virus” list by virtue of the fact that they put a card under a greater load than AMD believes to be realistically possible. Our 5800 series cards never throttled under these applications, so to see the 5970 throttle here was a bit surprising but not wholly unexpected.
The second case was using Distributed.net’s pre-release GPU client for use with AMD’s GPUs. Since this is a real program, this was absolutely unexpected, and is what instigated our look in to the matter.
In both cases, the key was the overall load on the GPU cores, and consequently the amount of power required to drive the GPUs. When a bank of VRMs reached roughly 120C (this being averaged among all the VRMs in that bank), overcurrent protection kicked in and throttling began. In the case of FurMark this was very quick and even at 100% fan speed the cooler could still not keep the VRMs cool enough to allow full-time 850MHz operation. The Dnet client on the other hand was much slower to ramp up, and we ultimately found that 70% fan speed was enough to keep our hottest bank of VRMs below the threshold, stabilizing at 116C.
Notably, during this whole period the GPU cores themselves stayed at or under 94C, which is still a few degrees below their own throttle point. AMD’s fan quickly ramped up, and in our testing it only needed to go to 59%. So if the cores did get hotter there was still plenty of room to go with the fan.
This brings us to our first point of concern for the 5970, which is the fan speed. Clearly it’s adequate for the GPU cores themselves, but we cannot find any proof that the fan speed is adjusted based on the temperature of the RAM or the VRMs. If the fan speed were to ramp up in the case of near-critical temperatures in the VRMs, then the Dnet client likely would have ran without an issue the first time, as this would have pushed the fan to 70%.
We asked AMD about whether the fan speed is affected by VRM temperatures at all, but we didn’t receive a response. This isn’t particularly surprising since post-launch periods are a good time to take a vacation and there’s a holiday this week for their American employees, but it means we couldn’t get a confirmation of our assumption. So for the time being, we’re working on the assumption that only GPU core temperatures drive fan speed.
It also bears mentioning that the 5970 gets quite a bit louder when the fan goes up to 70%. We went ahead and captured the noise data for it at 70% and 100%, which is in the chart below. At the 70% fan speeds needed to run the Dnet client at 5870 speeds, you’re looking at 70dB, which is quite a bit louder than the fan noise at stock speeds. It is in fact uncomfortably loud by this point.
Our second point of concern goes beyond just the fan, and is the overall cooling of the VRMs. When we looked at our Everest logs after running the Dnet client, we noticed something interesting with respect to which VRMs were overheating. The VRM bank attached to GPU 1 was some 25C hotter under load, but it wasn’t GPU 1 that was the hottest. GPU 2 was consistently a couple of C warmer. We don’t believe this to be in error, so to understand why this is, we refer back to our disassembled 5970.
As the fan is on the right, the right side of the heatsink the vapor chamber dumps its heat in to is going to be cooler than the left side by the virtue of the fact that the left side is effectively using the already hot-air of the right side to cool. The heatsink and vapor chamber mitigate this some, but the right side of the card – and consequently the right GPU– should be cooler than the left side. This leads us to believe that GPU 1 is the right GPU, and GPU 2 is the left GPU.
This is important since if we look at the VRMs, the VRMs feeding GPU 2 sit under the vapor chamber, while the VRMs feeding GPU 1 (along with the RAM and PCIe bridge) are not. We haven’t been able to fully dissect the cooler, but the VRMs on the right side sit right underneath the fan, and we don’t believe there to be a significant heatsink in the metal bar that sits above them. So while the VRMs feeding GPU 2 are being cooled by the vapor chamber, the VRMs feeding GPU 1 are only being cooled by the heat dissipation properties of a metal bar.
From this, we can conclude that the VRM banks are receiving wildly different amounts of cooling. The VRMs on the right side are not cooled nearly as well as those on the left and as a result the card is being held back by the VRMs on that right side. To that extent, we believe that if all the VRMs received the same level of cooling as the VRMs on the left side, then the card would have no problem maintaining 5870 speeds while running the Dnet client, and likely even FurMark. It’s also worth noting that all the 5800 series cards share the design of placing the VRMs under a metal bar under the fan, but the 5970 seems to suffer more for it compared to the 5800 series.
Finally, there’s the matter of whether this is even going to matter for most users. After catching the VRMs hitting 120C under the Dnet client, we went looking at other applications and games to see where else the card was throttling. The result of that inquiry was that we couldn’t find anything else that could match the Dnet client in total load. The Dnet client is a bit of a special case here, since crunching encryption keys makes exceedingly good use of the 5-wide SIMD design in the 2000-5000 series cards. When we took a look at something similar to the Dnet client, in this case the Folding@Home GPU client, we couldn’t break 100C. The significance of that result remains to be seen though, since the Folding@Home GPU client hasn’t been optimized for the 5800/5900 series yet like the Dnet client has. Our ultimate concern is that this card is going to repeatedly fall flat on its face at 5870 speeds with more GPGPU applications as OpenCL and DirectCompute take off, and the number of such applications bloom.
Radeon HD 5970 Temperatures | ||||
GPU 1 Temp | GPU 1 VRM Temp | GPU 2 Temp | GPU 2 VRM Temp | |
FurMark | 89C | 110C | 91C | 83C |
Dnet Client | 87C | 101C | 88C | 77C |
FurMark OC | 91C | 120C | 94C | 100C |
Dnet Client OC | 93C | 120C | 94C | 94C |
Cryis Warhead OC | 87C | 96C | 89C | 74C |
STALKER OC | 85C | 96C | 88C | 72C |
Meanwhile in games it was a similar story. Crysis and the STALKER benchmark are two of the most demanding games we’ve tested on the 5970, and in both cases the VRMs again peaked at near 100C. As games aren’t going to hammer the SIMDs like GPGPU applications do, the power load from games should be lower than for GPGPU applications.
As far as our opinion on the 5970 is concerned though, this doesn’t change anything. While we’ll buy AMD’s “power virus” rationale for FurMark and OCCT, the Dnet client is not a power virus. It’s a real application, one that AMD even used in their 5800 presentation back in September. Thus as far as we’re concerned, our 5970 is only good for 775MHz, the lowest clock speed where the VRMs stayed under 120C. Granted, AMD will never officially promise that the 5970 can reach 5870 speeds, but based on how the 5970 was promoted and presented the fact of the matter is that the card can’t meet its advertised capabilities – this card is clearly meant for 5870 clockspeeds.
With that in mind, we’ll end on two thoughts. The first of which is that in spite of our experience, for pure gaming scenarios we don’t have any data to bring in to doubt the idea that the card can run at 5870 speeds without throttling. So long as you only intend to play games, those speeds should be fine.
Our second thought is that cards from vendors with custom overclocking utilities will be better able to maintain 5870 speeds at all times. These are cherry-picked chips, so there’s no reason why they absolutely need 1.1625v core voltage to run at 850MHz; we suspect that they could do with less. Since voltage is our main enemy here, even a small drop in voltage should have a noticeable impact on VRM temperatures. But you’re going to need a utility with a full suite of voltage options to take advantage of that.
45 Comments
View All Comments
Hauk - Friday, November 27, 2009 - link
The copper bar for those VRMs needs to be connected to something. The low grade metal of the cooler body isn't wicking/dissipating heat away from the copper bar fast enough. Slap the best TIM available under the copper bar, you still have a traffic jam ahead..overclocking101 - Friday, November 27, 2009 - link
have you tried a preassure mod at all? i mean just some small plastic washers? this brought my 4870 vrms down by almost 10c!! and they were under a bar just like that im thinking amd is not putting enough preassure for the cooler to make the best contact to be safe not to "cruch or ruin" any of the component but just some small plastic washers help a lot at least in my case they did. just a thoughtjigglywiggly - Saturday, November 28, 2009 - link
I am going to get one(probably), I am wondering, how big is the gap between the vrm and the heatsink? Is it enough to put a goop of thermal paste? I was thinking of using mx-2, it would probably help, I don't want to use their thermal tape.Ryan Smith - Saturday, November 28, 2009 - link
I didn't bother measuring it, but it's big enough that you can't use paste. Something sizable needs to be there to connect the metal bar with the VRMs. The VRMs and the memory chips sit much lower than the GPUs themselves do.wh3resmycar - Friday, November 27, 2009 - link
i have a feeling ati-fanboys all over will perfectly find this article "offending".xpclient - Friday, November 27, 2009 - link
Please I want video encoding, decoding/playback benchmarks. I read in their forums that ATI doesn't use DXVA HD (introduced in Windows 7) on their GPUs but their own API. Intel and Nvidia use DXVA HD.PorscheRacer - Thursday, November 26, 2009 - link
I had this problem on my R600 reference design, though instead of the throttling I would get lockups, hangs or bluescreens when severely overclocked. After removing the backplate and front cover/heatsink, I removed the old thermal pads and applied liquid metal and AS5 for the VRMs and AS5 for the die. Everest showed a significant drop in temperatures and less ramping up of the fan. VRMs are unknown as Everest doesn't report this... I think ATITool did, but I don't recall that anymore.Very interesting though, and thanks for the investigative journalism. It's one thing to say, oh well in this benchamrk for some reason the 5970 did poorly, and another to explain why it did.
Sahrin - Wednesday, November 25, 2009 - link
So far as I have read, AT is the only website that has had any issues getting the 5970 to 850/1200 on both GPUs. Has anyone considered the possibility that AT just got a bad sample? I know AMD 'built these to overclock' - but stock is stock is stock. I'd be interested to see what your testing environment is, and if there's any impact from that on temperatures. From the articles I have read, the clock increases have been pretty painless.Is AT's part a review sample or a retail card?
Ryan Smith - Wednesday, November 25, 2009 - link
It's a review sample, but it's identical to a real card.And yes, it's always possible we got a bad sample. But bear in mind that throttling probably isn't going to show up in a game. So unless the other guys ran FurMark/OCCT/Dnet and were specifically looking for it, they would have never noticed. I'd be surprised if their cards' VRMs didn't get similarly hot.
Zok - Wednesday, November 25, 2009 - link
Maybe my eyes are deceiving me, but it looks like there is a 10th VRM chip at the top of the board, where it makes contact with similar white thermal goop used on the other 9. Could you comment on this Ryan?