Searching for the Memory Holy Grail - Part 2
by Wesley Fink on August 26, 2003 11:11 PM EST- Posted in
- Memory
Test Design
One of the difficulties in testing memory is that most of the memory benchmarks available are synthetic. While synthetic benchmarks can be useful in comparing performance, they can also paint a distorted picture of real-world performance. This is the reason why AnandTech has always preferred benchmarking with real applications. Benchmarks using games are dependent on many system components for their results, including the CPU speed and Video Card, which have a major impact on the final scores. While memory does impact the game benchmark score, it is only one small part of the total score. Finding a game benchmark that is sensitive to memory is not always easy. We discovered Gun Metal 2, for instance, tends to be video-card bound, making it very useful for testing video cards, but not so useful for measuring subtle differences in system performance. After looking at available game benchmarks, we found Quake3 and Unreal Tournament 2003 to be the most useful for our memory testing.
The following Benchmarks were used in our Memory Testing:
1) SiSoft Sandra Max3 UNBuffered Memory Test
Part 1 of “Searching for the Memory Holy Grail” demonstrated the usefulness of the SiSoft Sandra UNBuffered Memory Test as a sensitive benchmarking tool for memory bandwidth. The Sandra UNBuffered Memory Test turns off Memory Buffering schemes in an attempt to improve the measure of raw memory bandwidth. As a result, it also correlates well with bandwidths reported with Memtest86, an industry-standard memory testing tool.The idea of the UNBuffered Memory Benchmark is very simple — you merely turn-off all memory buffering techniques. Sandra makes this very easy to do. Select “Memory Benchmark”, right-click “Module Options”, and uncheck the nine boxes that are related to buffering.
2) SiSoft Sandra Max3 Standard Memory Test
The UNBuffered Memory Benchmarks are quite different from what you may be accustomed to seeing in memory testing with SiSoft Sandra. For reference, we are again including the Sandra Max3 standard Memory Test, sometimes called the Buffered Memory Test.
3) Super PI
Pure number-crunching benchmarks are very useful for measuring system bandwidth. Some of the more popular number-crunchers are the MPEG/DIVX encoding tests, such as the ones that we used in our standard motherboard testing, and Super PI. MPEG/DIVX tests are valuable for a single motherboard benchmark and in cross-platform testing — Athlon vs. Pentium4, for example. However, they are often very sensitive to the test environment or system configuration, and can be difficult to use reliably in an environment that tests a large number of conditions with the same test, such as we will be doing here in our memory testing. Super PI, on the other hand, is very simple to use and has been shown to be less sensitive to the operating system environment. In other words, we don’t have to reinstall the operating system on a clean hard drive each time we run a benchmark just to get reliable numbers.Super PI for Windows 1.1 is a freeware program developed by the Super Computer Consortium at the University of Tokyo. The concept of Super PI is very simple — it calculates the value of pi to “x” number of places, and reports the time this calculation requires. We chose to use 2 million places in our tests. Super PI measures total system bandwidth, and memory is only part of that bandwidth, since the CPU has a significant impact on results. We therefore would expect to see smaller changes in Super PI relative to larger changes in memory-only benchmark tests like Sandra.
4) Quake3 Demo FOUR.dm_66
Quake 3 Demo FOUR is one of our standard game benchmarks. As Evan Lieb showed in his PC3200 memory tests, Quake3 can also reveal variations in memory performance. You will likely be surprised how sensitive Quake3 can actually be in testing wide variations in Memory Speed. We run the benchmark three times, check for score consistency, repeat if we see any wide variation in individual scores, and then average the three scores for the reported Frames per Second (FPS) value.
5) Unreal Tournament 2003 Demo
The Benchmark program built into the UT2003 demo is a contemporary game test that does respond to variations in memory bandwidth. We used it mainly to show the impact of memory speed on UT2003 scores, and to confirm the validity of Quake3 as a real-world test of memory performance. With our new standard ATI 9800 PRO video card, UT2003 shows variation in both Flyby and Botmatch in tests with memory of different speeds. All benchmarks are run at our standard 1024x768 resolution.
Motherboard, CPU, and Peripherals
In Part 1 of “Searching for the Memory Holy Grail”, we used the Abit IC7 for our 875 tests and the Asus P4P800 Deluxe for 865 tests. Because of the impact of PAT on/off at different FSB, we decided to use only the Intel 875 for testing in Part 2. This presented our first problem, since the Abit IC7 has a strange quirk in the way it handles 1:1 memory. No matter what we did, we could not operate the Abit IC7 at greater than a 255 setting (1020FSB). We talked with Abit about this issue, and they are hard at work on a BIOS revision to correct this problem. To be fair, most users will not need to run memory at faster than a setting of 255, and 5:4 and 3:2 memory ratios do not have this issue. However, for our tests here, we expected that we might reach a speed of 275 (1100FSB), and the 255 limitation was not acceptable.Next, we considered the DFI 875PRO LanParty as the motherboard for our testbed. The DFI had no problem handling settings above 255, which corrected that problem. However, as Evan Lieb pointed out in his review of the 875PRO, the vDIMM range to only 2.7V was too limiting for our high-speed memory tests. DFI has told us that they are releasing an updated version of the 875PRO in the near future with expanded vDIMM options. If that were available today, the DFI would have worked well for our testbed.
The latest revision of the ASUS flagship 875 motherboard is the ASUS P4C800-E. This board adds Intel GigaLAN (using the dedicated Intel CSA bus), and incorporates the ICH5R Southbridge with Intel SATA RAID. We will be doing a review update on this new revision of the ASUS flagship Canterwood shortly. The P4C800-E met our requirements of high speed 1:1 operation and a vDIMM adjustment range that was useful. vDIMM is available to 2.85V on the P4C800-E. As a bonus, we were able to use Intel SATA RAID with SATA drives for all testing.
Our 3.0C Pentium 4 800FSB chip was not very useful for testing DDR500 memory. With a maximum overclock of around 245 (980FSB), we could not even reach the rated speed of the memory. We settled on a 2.4C 800FSB Pentium 4 that has been proven to perform very well at high speeds. On the ASUS P4C800-E, this 2.4C was able to reach a stable 288 setting (1152FSB) at default 1.525V, and a setting of 298 (1192FSB) with a modest vCore setting of 1.6V. We were confident that this test setup would allow us to reach the maximum speeds possible with memory rated at a high as DDR500, since we did not anticipate that synchronous operation would exceed DDR596 in our testing.
Since all testing would be done on a single testbed configuration and only compared to test results on that testbed, we chose the best-performing components that we had available. For CPU cooling, we used the Thermalright SLK-900U heatsink with a 120mm adjustable-speed Vantec Tornado cooling fan. The idea here was to remove any concerns about CPU cooling or overclocking ability from the memory test as much as possible. For the video card, we used our new standard ATI Radeon 9800 PRO with 128 MB memory. Hard drives were a pair of Western Digital Raptor 10,000RPM Serial ATA drives running in a SATA RAID 0 (Striping) configuration on the stock Intel ICH5R.
77 Comments
View All Comments
Anonymous User - Friday, August 29, 2003 - link
This is quite confusing for a noob like myself, but I want to make the right purchasing decision, as I've never dabbled in overclocking, but hope to begin with this new setup.I'm waiting for the new Abit IC7-G Max III mobo to be released shortly. I'm targeting a P4 3.0C processor, and had been looking at Geil PC4000 platinum, though I suppose I should also now consider OCZ.
What processor and RAM combination on that motherboard will provide the best total results after overclocking? What part does the timing play in it? Will a 3.0C P4 not achieve as fast a bus speed as say, a 2.8C, meaning that a 2.8 would render ultimately the highest performance?
Any help is appreciated.
Anonymous User - Friday, August 29, 2003 - link
One thing I found odd was that there was no mention of cost. I picked up 1GB of Geil PC4000 Plat for $305 shipped which is considerably less than the RAM from all the other manufacturers. Given the results, that's a pretty sweet deal.Wesley Fink - Friday, August 29, 2003 - link
#63 -Please read the review. Not everyone had DS modules available at the time. We asked manufacturers for, at the minimum, 2 double-sided modules or 4 single-sided modules. This is because it would be unfair to compare performance of 2 SS modules to 2 DS modules.
Kingston was the only manufacturer who chose to supply 4 SS modules. We compare 4SS modules to 2DS in our review which IS fair. Results with 2SS modules were used to illustrate why you should use FOUR modules for best performance if they are SS.
oldfart - Thursday, August 28, 2003 - link
Wesley, don't get me wrong. I'm not trying to say 5:4 is "better" than 1:1. Why would it be?Not too long ago there were people who were adamant that unless you ran 1:1 ratio, you had a "crippled system". Another myth that was spread around was memory timings didn't matter on a DC DDR system (where the heck did that come from?).
People sold their PC3200 and got PC3700, ran 1:1 and got no performance increase or even a performance decrease and wondered why.
Websites were doing reviews that consisted of nothing but synthetic mem benches that showed 10x - 30x the performance gain that real world benches showed. These same sites are sponsored by memory manufactures selling that ram.
I guess I just got tired of all the misinformation being put out on the net.
Truth is right now, 5:4 low latency vs. 1:1 high latency produces ~ the same results. The actual difference is nothing you would ever notice in real usage. 1% one way or the other means nothing.
Once you can have high speed and low latency, things will change.
Anonymous User - Thursday, August 28, 2003 - link
Why are they benching 256MB Kingston modules against 512MB double-sided modules from all the other vendors? The tests clearly show 4 DS configuration is fastest. Why didn't they test 512MB Kingston DS modules? They are comparing apples to oranges at Kingston's expense.Wesley Fink - Thursday, August 28, 2003 - link
It seems that those proposing 5:4 is just as good or better always want to compare the WORST DDR500 timings to the BEST DDR400 timings. 2-2-2-5 is no more a typical DDR400 timing setup than 3-4-4-8 is at DDR500. Look at the timings that actually WORKED with DDR500. In fact, IF you can find DDR400 that can do 2-2-2-5 you will pay quite a premium for it - just like you do for DDR500.Also the DDR550 we achieved with the best DDR500 would need to be compared to 5:4 at DDR440 running at 2-2-2-6 or so, and the 300FSB some achieve with the 2.4C would need DDR480 - just to run 5:4. With a CPU that achieves high FSB, the DDR500 may be the best choice EVEN at 5:4.
I do think it is a mistake to overlook how very good 5:4 can perform with FAST timings memory, but I also think it is a mistake to pretend 1:1 doesn't matter in performance - because it does. It is ONE of the things that matters, but by no means the only thing.
I am looking right now at some DDR533 Engineering Samples that run 2.5-2-3-6 at DDR533. When these and other faster timing DDR500+ are released, this argument will disappear. BOTH speed and timings matter - and neither is the complete picture.
This review goes into great detail to point out that DDR500 is NOT needed by everyone, and in fact requires a setup that can actually RUN at 250 (1000 FSB) to get ANY benefit. We also pointed out that for most with a 2.8 to 3.2 CPU that a slower memory with faster timings would be a much better choice for performance.
oldfart - Thursday, August 28, 2003 - link
I didn't compare all the results, but looked at the Q3 numbers compared to other reviews that have done the same testing. I'll use the Corsair 4000 numbers:Your review
XMS4000 DDR500 1:1 400.2 FPS
PC 3500 DDR400 5:4 393.7 FPS
This test:
http://www.hardtecs4u.com/reviews/2003/ddr400_roun...
XMS4000 DDR500 1:1 3-4-4-8 340.8 FPS
XMS3200 DDR400 5:4 2-3-2-6 338.9 FPS
Numbers are very close. 2-2-2-5 would have been faster if run that way.
***********************
This test:
http://www.ocprices.com/index.php?action=reviews&a...
XMS4000 DDR500 1:1 3-8-4-4 320 FPS
XMS4000 DDR500 1:1 2.5-7-4-4 338 FPS
XMS3200 DDR400 5:4 2-5-2-2 340.5 FPS
In this test, the PC3200 low latency is a bit faster than the PC4000 with medium timings, quite a bit faster than the slowest timings.
In all of these tests, the difference is very small when it comes down to it. A tie is more accurate.
My points:
1)the people who think they are "crippling" their P4 rig by running a mem ratio are mistaken. You can get the same performance if you set it up right
2) SiSoft mem benches do not represent real world performance. They show an inaccurate view of system performance gains.
3) Certain site push PC3700/4000 too hard and neglect to show that equal performance can be had with less expensive ram.
4) I hate posting this here!! Bring back the AT articles forum!
Wesley Fink - Thursday, August 28, 2003 - link
#35, #38, #44, #49, #50, #52, #55, #57 -To answer your question, we ran 1000FSB (500) at 5:4 with Mushkin PC3500 Level II at CAS 2-2-2-5. This Mushkin is about the only memory left that can REALLY do 2-2-2-5 at DDR400, and a review will be up soon. The testbed and ALL hardware and settings were the same as this review. Results are:
Sandra UNBuffered - 2964/2959 or avg. 2962
Sandra Buffered (Standard) – 5470/5468 or avg. 5469
Quake 3 – 393.7fps
UT2003 – Flyby: 241.84
Botmatch: 87.66
SuperPI (2M places) – 105s
Write these numbers down and compare them to Page 14 (500FSB/DDR500) charts. You will see that 5:4 2-2-2-5 is very close to the performance of the poorer DDR500 in our tests, but it does NOT beat the DDR500.
We are comparing the fastest memory I have tested at DDR400, at it’s fastest 5:4 timings, to DDR500 at much poorer timings. Of course the DDR400 goes even higher than DDR500 and performs even better.
BOTH timings and FSB speed matter, and the answers are not as simple as some have stated.
vailr - Thursday, August 28, 2003 - link
Please consider adding TwinMos 3700 to your updated review.http://www.showtimecomputer.com/cpumem/ddr.asp
quote: "
512 MB PC3700 400 (DDR/CL2.5 Twinmos Chip $119.00
512 MB PC3700 400 (DDR/CL2.5 Winbond Chip $125.00
TwinMOS stays one step ahead of the technology curve by launching one of the first PC3700 Unbuffered DIMM Modules. Featuring speeds up to 466Mhz, PC3700 DDR 466 delivers enhanced bandwidth up to 3728MB per second.
Check it Out: WWW.TwinMOS .COM "
Slappy00 - Thursday, August 28, 2003 - link
Ill say one thing about OCZ, wheather or not the review is bias in any way, ocz has come a long way to prove that they have a good product and stand by it. I have read countless posts where OCZ would gladly RMA some user's memory and give them pretested memory as a replacement. I for one bought GEiL pc4200 (really pc4000 with looser timings) and wish I had the kind of support offered by OCZ.In the end I would only use results based on reviews as a guide not a reference.
For example:
I have an Abit IS7 (BIOS 16) and my board will not do anything faster than 260 1:1 (520DDR) without memory errors (via memtest86), but I can run the timings more aggreesively (2.5-8-4-4) at 260 for some reason. I cant use any dividers (5:4 3:2) and I cant use GAT or I get the dreaded long beep at boot-up.
just goes to show you that just because its printed doesn't mean it's right for you.