• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

[Ret Sticky]Overclocking sndbx for A64 939 systems with Winchester, Opteron dual core

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.
hitechjb1 said:
CPU at 2799 MHz = 311.0 MHz x 9
TCCD 2.5-3-3-7 1T 2.9 V, 311 MHz, 1:1 ratio
http://hitechjb1.dynu.com:8081/imag...bhd_311x9_mem_311_2.5-3-3-7_everest_write.JPG
I find it quite interesting that I got better results :clap:
my.php


Do you think it has something to do with the used /10-divider (180, look at my screenshot)
This doesn't make sense, but it might be possible... or it is Tref :D
 
Stability testing using memtest, SuperPI (32M) and Prime95

Memtest86 tests memory but not stressing the CPU and system bus, that is why it is easier to pass unless the memory is clocked too high or is set with too tight timing. It is necessary for memory test but not sufficient for CPU + memory stability.

SuperPI 1M uses small amount of memory (~ 8.4 MB), that is why it is much easier to complete than 8M, 16M, 32M. SuperPI 1M mainly runs off the CPU and cache. It is necessary but not sufficient for CPU + memory stability.

SuperPI 32M uses lots of memory (~ 268 MB), and so both CPU and memory are tested and stressed, including the system bus and chipset as paging takes place. That is why SuperPI 32M is a good, quick way to test the speed and stability of the system. It is a minimal test that we should use as a starting point.

Prime95 with small FFT (8K - 32K) tests/stresses only CPU and cache, so it is not sufficient for CPU + memory stability. It stresses the CPU constantly with priority control, but since memory and system bus are not involved, unless the CPU is clocked too high, it is more predictable and is less susceptible to random failure. It is not sufficient to ensure memory and system stability.

Prime95 with in-place large FFT (8K - 1024K) tests/stresses CPU, cache and some memory heavily with custom priority (few MB of memory only for inplace unless custom option is used), so it is needed in additional to the small FFT tests to uncover any stability issues for CPU, cache and some memory. Since some memory is involved compared to small FFT (but is still much less than that compared to blend of few hundred MB) and the test can be affected by other background processes, so the test failure is more random in nature.

Prime95 with blend/custom (8K - 4096K) tests/stresses CPU, cache, memory, chipset and system bus as a whole with custom priority. Memory is also stressed to the greatest extent, and the amount of memory used can be adjusted (custom option). It should be used to completely test the CPU, cache, memory, chipset and system. As larger amount of memory is involved, including hard drive paging, video, system bus activities which can be affected by other background processes (even with minimum number of processes running), so the test failure is most random in nature and hardest to pass.

Custom option can be used to select FFT size and amount of memory to fully diagnosing the stability of a system as a whole or focuing on selective part of a system.

Memory size can be adjusted from a small amount to a large amount, up to the virtual memory size. When the memory size of Prime95 and some background and resident programs is larger than the physical memory size, frequent paging may occur at the beginning and during the run and the CPU would not be utilized fully. If it is observed that CPU ultilization is very low for a long time during Prime95 run, reduce the memory size by some amount should reduce paging to minimal and Prime95 can be run with CPU fully utilized. On the other hand, very large memory size can be set to test the system as a whole with heavy paging. E.g. system memory 1024MB, Prime95 memory size default = 927 MB, heavy paging may occur (in some system setups), reduce memory size to 920, 910, 900, ... would reduce paging to minimal.


Some observation on the order for ease of completion (from easy to difficult):

post - just needs CPU and memory to run
memtest86 - tests entire memory but CPU not stressed
OS boot - tests hardware/software/drivers but CPU, memory, chipset not stressed
SuperPI 1M - stresses CPU, cache to some extent
3Dmark 01/03/05/games - stresses CPU, cache, memory, system bus, video subsystem to some extent
SuperPI 8M - stresses CPU, cache, memory to bigger extent
SuperPI 32M - stresses CPU, cache, memory, chipset, system bus to bigger extent
Prime95 small FFT - stresses CPU, cache to biggest extent with priority
Prime95 (in-place) large FFT - stresses CPU, cache, some memory to biggest extent with priority (harder to pass)
Prime95 large FFT blend/custom - stresses CPU, cache, memory, chipset, system bus to biggest extent with priority (hardest to pass)

As moving down from easy to difficult, the frequencies of the system bus, memory and CPU have to be lowered. Typically by as much as 20 - 30 MHz on memory (250 - 300 MHz level), 200 - 300 MHz on CPU (at 2.5 - 3.0 GHz level).


Since SuperPI and Prime95 are some kinds of computation, if they give errors at certain high clock and memory timing setting, that means the computer is doing wrong computation and the system can potentially crash under certain load condition (may be the next second, minute, hour, day). It means certain computation, no matter how small, even a single bit, the CPU, memory or system are delivering wrong results under certain unpredicatable conditions, due to high clock frequency and fast timing in the CPU and/or memory, or sudden surge of temperature in certain part of the CPU (hot spot), or sudden change of system physical behaviors such as voltage, system bus timing, ....

In order to get the system passing the above stress tests is to slow down the clock/timing of CPU, memory, and to improve cooling.
 
Last edited:
Wow, this looks like a hot thread! Too bad I didn't notice it til now. I can't read all 9 pages, though. =(

Good work! Seems very useful. I read a bunch of the first page, but will do more when I have the time.
 
Updated to official bios 03/10/05 which supports revision E3 Venice and revision E4 San Diego.

Used for about 1 day, there is no noticeable difference in overclock frequencies and stability compared to the previous bioses 01/25/05 (official) and 02/17/05 (beta).
 
There is no noticeable difference in overclocking, stability and performance between the 03/10/05 bios (official) and the previous 01/25/05 (official) and 02/17/05 bios (beta). Both bios work fine with TCCD and UTT memory modules.

The official 03/10/05 bios comes with NVMM 4.85 and NV RAID 4.81, and memory compatibility enhancement.

Current Nforce4 drivers, nView and the audio mixer are from a single .exe file nForce_6.39_WinXP2K_WHQL_english.exe downloaded on 03/12/05.

DRAM setting @ 315 MHz

lp_ultra-d_winnie3000_cbbhd_315x9_mem_315_A64_Tweaker.JPG




Sometime it is beneficial to lower the memory frequency slightly to get better latency, such as

TCCD at 317 MHz timing 2.5-4-4-8 1T
TCCD at 315 MHz timing 2.5-4-3-7 1T
TCCD at 310 MHz timing 2.5-3-3-7 1T

Will show the tradeoff between them using SuperPI 32M.

(It has been shown previously that 300-310 MHz 2.5-3-3-7 1T ~ 250-260 MHz 2-2-2-5 1T.)
 
Mate,

Could you try this in your stability tests? Its supposed to run 5C hotter than Prime 95.
http://home.comcast.net/~wxdude1/emsite//download/stresscpu.zip

What is StressCPU?
This is a small windows program to torture-test your CPU in order to make sure that you don't have overheating problems. It will only run on SSE-equipped x86 CPUs, and it is executing a special version of the Gromacs innerloops that mixes SSE and normal assembly instructions to heat your CPU as much as possible.

The program was written by Erik Lindahl and I simply compiled it and am making it available to use.
This program actually makes my CPU's run from 4C-6C hotter than simply running Gromacs.
It's a good heat test and it should make any system draw maximum power which will also test the stability of your powersupply.

Let me know if you have any issues.
You can get the program in the EM-DC download area.

Larry
http://www.em-dc.com

Refrence thread:-
http://www.ocforums.com/showthread.php?t=362053
 
The "stresscpu.exe" program was run for about 30+ min., 680 K iterations, about 5 minutes per 100 K iterations on the Winchester at 2.8 GHz.

It mainly stresses the CPU as the memory usage is only 936 KB. It does not stress the memory and the system as much as SuperPI 32M and Prime95 large FFT and Blend.

During the test, it did raise the CPU temperature higher by about 2 - 4 C compared to that of SuperPI 32 M (40-42 C vs 38 C).

DFI LP UT NF4 Ultra-D
Winchester 3000+ 2.80 GHz 1.52 V (311 x 9)
G. Skill TCCD 4400 LE 311 MHz 2.5-4-3-8 1T 2.8 V

lp_ultra-d_winnie3000_cbbhd_311x9_mem_311_A64_streecpu.JPG
 
How to compare processors of different architectures and frequencies

Preliminary
Will update the post in more details.

Here give a brief update on how to compare different processor performance.

Instead of going into the internal of processor architecture and college level EE & CS concepts such as pipeline, cache, instruction set, micro-instruction and micro-architecture, parallel execution, branch prediction, out of order execution, instruction fetch and decode, scheduling, cache trace, backtrack and error recovery, multi-threading, ..., comparing different processors can be done at the macroscopic performance level and can be understood and measured readily in layman's term.


CPU performance can be measured based on the amount of instructions executed per clock cycle using a predefined set of programs (called benchmark such as SPEC, Dhyrstone benchmarks) for processors of different architectures, cache sizes and running at different frequencies.

IPC = Performance / frequency

Performance can be measured by in millions_instructions_per_sec or MIPS for integer computation, or FLOPS for floating point computation.

IPC stands for instructions per cycle, i.e. number of instructions executed per clock cycle. Since IPC is normalized by frequency, so it is an invariant for a given architectural implementation and is independent of the processor frequency.

First, based on the Sandra Dhyrstone integer benchmark:

- Winchester 3000+ 512KB L2 at 2.9 GHz, 13315 Dhrystone MIPS <--- my DFI NF4 + CPU setup
IPC_Dhyrstone = 4.59

- Opteron 152 1MB L2 at 2.6 GHz, 11573 Dhrystone MIPS
IPC_Dhrystone = 4.45

- P4 E 1MB L2 w/ 2 SMT at 4 GHz, 11707 Dhrystone MIPS
IPC_Dhrystone = 2.93

- P4 C 512KB L2 w/ 2 SMT at 3.2 GHz, 8395 Dhrystone MIPS
IPC_Dhrystone = 2.62

- Pentium M 2MB L2 at 2 GHz, 8608 Dhrystone MIPS
IPC_Dhrystone = 4.30

- Athlon 3200+ 512KB L2 at 2.2 GHz, 9142 Dhrystone MIPS
IPC_Dhrystone = 4.16 <--- this number seems to be higher than previously obtained, to be verified

- Athlon 2600+ 256KB L2 at 2.08 GHz, 8643 Dhrystone MIPS
IPC_Dhrystone = 4.16 <--- this number seems to be higher than previously obtained, to be verified


So at the same frequency, or clock for clock, for CPU IPC performance

- An A64 (Winchester) is performing 4.59 / 2.93 = 1.57 times that of a P4 E (w/ 2 SMT) measured in Dhrystone benchmark
or an A64 at 2.5 GHz is about tie with a P4 E (w/ 2 SMT) at 3.9 GHz based on Dhrystone benchmark

- An AXP (Barton or Tbred B) is performing 4.16 / 2.93 = 1.42 times that of a P4 E (w/ 2 SMT) measured in Dhrystone benchmark.

- An A64 (Winchester) is performing 4.59 / 4.16 = 1.10 times that of an AXP measured in Dhrystone benchmark.


Floating point performance w/ SSE2, SSE3 can be measured similarily.

Other benchmarks measuring CPU performance and IPC such as SPEC can also be used. The above example illustrates how to compare CPU performance, based on the amount of instructions executed per cycle.


Appendix:

Pipeline length (integer) - number of clocked logic stages to execute an instruction

Pentium M (Dothan) 10 - 12 (to be confirmed)
Pentium 4 (Prescott) 31
Pentium 4 (Northwood, EE) 20
Pentium 3 10
Athlon K8 (A64, FX) 12
Athlon K7 (Tbred, Barton) 9


These were written a while back during the days of the legendary Tbred B DLT3C, attempting to answer the same kind of questions about how to compare processors with different architectures and frequencies, ....

What is IPC and how to compare cycle or Hz for different CPU architectures (page 19)

Analogy for comparing CPU cycles (page 19)

Cache and CPU performance (page 19)
 
Last edited:
Now that I think about it... I think that was a good OCF101, lol. I never heard about Dhrystone MIPS.
 
Hitech,

I am scratching my head here, You have air cooling and achieved 3006MHZ@334x9. But I can't seem to get that with my prometeia mach 2. Can you help me? Also how do u keep your voltage so low when you OC that high? I am using OCZ plantinium rev2 3200+ ram. Actually I can't seem to go beyond 290mhz. I start gettting errors in memtest after 290MHZ. One more thing, I can't seem to pass prime95. Any info from you would be greatly appreciated. :shrug: :bang head
290MHZ
ltd= x4
fsb multiplier= x10

This is my dram config:
200 dram frequency 1:1
enable CPC
2.5
4
7
4
7
14
5
3
2
5
3120
5
bank interleave= enable
DQS skew control = increase
DQS skew value= 0
Dram drive Strength= level 6
Dram Data Drive Strength= level 4
max asynce latency= 7x
read preamble= 5x
idle cycle= 256
dynamic counter= disable
r/w que bypass= 16x
bypass max= 7x
32 byte granularity= disable 8 bursts.


cpu startup vid=1.425volts
cpu vid =1.425volts
cpu vid special control= 123%
LTD voltage=1.4v
chipset voltage-1.8v
dram voltage= 2.9v








NVENTIV MACH 2 | DFI NF4 SLI_DR | OCZ Platinium rev2 2.5-4-4-7 2.9V | WINCHESTER 3200+ @2900MHz - 290x10 - 1.75v | 2x BFG 6800GT370/1000 | VAntec Stealth 520w psu|
 
OC NOOBIE,

First welcome to the forums and your very first post here.

I think the main factors for getting good overclocking results are:
1. A good, selective set of hardwares.
2. Luck in getting some highly overclocking components, mainly CPU, memory, motherboard and (video card).
3. Careful observation, detailed analysis of results, and (patience).

I think your hardware set is very similar to what I use, they are a good set to begin with. Your memory is TCCD based (I think), which is good for tweaking NF4 boards over a wide range of frequency and timing, low voltage compared to BH-5/UTT. The DFI NF4 board is also very flexible for memory tweaking and selection of voltage range (way more than needed).

The 3006 MHz is just for system posting, boot and getting into memtest. The system can boot into windows at 2.95 GHz. Run SuperPI 1M at 2.9 GHz. SuperPI 32M at 2.85 GHz. Prime95 stable at 2.73 GHz. So there is the common frequency span of 200 - 300 MHz between highest CPU boot and stable Prime95.

I think I may have been lucky to get all the CPU, memory and motherboard in one shot which can perform so well. Let me suggest what I may try from what you described.

1. My G. Skill 4400 LE can run ~310 MHz 2.5-3-3-7 1T 2.8 V and up to 350 MHz 3-5-5-10 1T 2.8 V. If you have set your timing correctly and is still getting 290 MHz, apparently your OCZ rev2 platinum is not as overclockable.

But first, try to set all extended timing to AUTO (or use latest official bios 03/10/05 which defaults them to AUTO, with max async latency to AUTO), and confirm with memtest that indeed 290 MHz 2.5-4-4-7 1T is the max of the memory. Don't worry about setting those extended timing to AUTO until finding the max for CPU, memory. Tighter extended timing may help to get a point or two in benchmarking (later). If some of the extended timings are set too tight as in your list (e.g. max async latency = 7), it may hinder your max memory frequency which also determines you max HTT and CPU frequency (unless you go into using memory_HTT_ratio other than 1).

DRAM Bios Setting (for TCCD) bios 02/17/05
DRAM Bios Setting (for TCCD) bios 03/10/05 @ 315 MHz

This is the very first thing you need to determine, the max memory frequency and the associated timing. If you have access to another set of memory modules, try them too.

2. As for CPU, the 90 nm Winchester is a very tricky piece of silicon, it requires a low enough amount of voltage for finding a given stable frequency. With just the right amount of voltage, it can run cool and fast. But do not over-apply voltage to it, as it would generate unnecessary heat and hot spot due to leakage current and in turn create unnecessary instability and slow down. I found that for my setup, 40 C max on air, 1.5 - 1.55 V is the right balance of temperature and voltage. Anything higher than that would add instability.

Try to lower your CPU voltage as much as you can to look for a given frequency.

What is the load and idle CPU temperature?

What are the highest CPU boot, post, OS boot frequencies and voltage?

Try to find out what is the max CPU frequency at 1.4 V.

Bare minimum voltage at maximal overclocking

3. In my setup, chipset voltage = 1.5 V, LDT voltage = 1.2 V, raising them does not help.

4. Try to set LDT multiplier to x3 explicitly to avoid uncertainty (you set it to 4). Setting it to AUTO should be OK too, but not as safe as x3 during testing.

5. Run SuperPI 32M, 3dmark 01/03 for quick CPU, memory and system stability test.

6. If Prime95 fails, use the different options to diagnose:

Prime95 small FFT is mainly for testing CPU.
Prime95 large FFT is mainly for testing CPU and memory.
Prime95 blend is most difficult to pass as it involves chipset, system bus in additional to CPU and memory, and its run is most susceptible to background processes, hard drive activities, paging, voltage fluctuation, ....

SuperPI 32M is a relative "quick" way to test CPU and system speed and stability

Stability testing using memtest, SuperPI (32M) and Prime95

7. I am not familiar with your PSU, hope it is fine. Do not overclock your 6800 GT while searching for optimal CPU, memory and motherboard operating points.

That's it for now, may add more later. Hope these help.
 
Last edited:
Two days ago, I found that the thread was given a “terrible” rating of 1, before that it got 10 “excellent” rating of 5 since the start of this thread slightly less than 2 months ago. I also got PM’s about its usefulness.

I looked at the rating of the other threads of the forums, the majority (90+%) of them get an “excellent” rating of 5, only very few get a rating of “average” of 3 at least.

I have been spending quite a bit of time to maintain these threads (there are 4 threads in this section) and try to share my experiences and thoughts (right or wrong). If there are data, techniques, presentations that are incomplete, erroneous, too lengthy, ..., please post suggestions instead of/in additional to just expressing as “terrible” rating.


I am aware that some posts are lengthy as I try to keep the content as complete and analytical as possible (rather than a simple answer without background and explanation). Some posts may require some understandings of the various disciplines in electrical engineering, computer science and semiconductor physics, and maybe beyond middle/high school level, as some readers would like to have some in depth looks and discussions of the issues and techniques.

There are lots of pictures in this thread and I have been aware that it can take a long time (frustration) to open, refresh and navigate this thread, even with high speed link.

In order to improve the navigation; I have reduced most of the picture size significantly by reducing its resolution (so in order to see the picture details, may have to use zooming in by opening another window or downloading the pictures). I prefer this approach instead of picture links so pictures can be seen readily in each post. It should now take much less time to refresh (except using dial up modems).

Please feel free to post any suggestions and comments.
 
Last edited:
hitechjb1 said:
Some posts may require some understandings of the various disciplines in electrical engineering, computer science and semiconductor physics, and maybe beyond middle/high school level, as some readers would like to have some in depth looks and discussions of the issues and techniques.
LOL!

No problems here, I actually enjoy reading while I hardly care about my computer any more... Helps to stay up-to-date and I simply like it :)
Keep on the good work- that is what counts, not the rating of a thread.
 
Hitech,

I don't know what is wrong with my system. It seem to work ok at stick setting. But when I overclock and try to run 3dmark05, the pixel get all crazy. it get choppy pixels. I dont know if that have to do with the power supply. its 520watts enought juice??
 
OC NOOBIE, try to lower the timings. I got artifacts as well when my timings were too tough- but I got them immidiately after booting...
 
OC NOOBIE said:
Hitech,

I don't know what is wrong with my system. It seem to work ok at stick setting. But when I overclock and try to run 3dmark05, the pixel get all crazy. it get choppy pixels. I dont know if that have to do with the power supply. its 520watts enought juice??

I don't know how many things you have tried on your setup. I can only suggest the basic steps to begin with.

To achieve good results, first you have to find out what the CPU (including the memory controller), the memory, the system (chipset, system bus), the video card can handle in a systematic way (especially when facing problems), i.e.
- avoiding unnecessary uncertainties
- eliminating unknowns step by step
- isolating problems

You have sub-zero cooling and a good list of hardware (I am not familar with your PSU, assume OK for now).

What is the five letter CPU code, such as CBBFD, CBBHD, CBBID?
What is the week code such as 0448, 0502?

Can you answer the questions that I asked you in the last post:
- What is the highest CPU frequency at 1.4 V, booting OS, SuperPI 1M, Super32M, 3dmark01/03?
- What is the CPU idle, load temperature under Mach2?

You have 3200+ Winchester, to start with, it should be set to x10 multiplier so HTT and memory bus can run lower to avoid uncertainty of memory.

Almost all Winchester reported here can do 2.4 GHz.
939 FX, Winchester, NewCastle

Try this
CPU VDD = 1.4 V
CPU_multiplier = x10
HTT = 240, 250, ... MHz
memory_HTT_ratio = 1:1
memory_timing = 2.5-3-3-8 1T (other extended timing set to AUTO in bios)
(or 2.5-4-4-8 1T in case of problem at higher frequency testing
some TCCD can run 2.5-3-3-7 1T to ~310 MHz ~2.8V)
LDT multiplier = x3

Run Memtest to make sure the memory can handle the memory frequency and timing (it should, but be sure first).

Run SuperPI 1M, 8M, 32M, 3dmark 01/03 each time when you raise the HTT and see how high the CPU, memory and system can go.

The eventual stability should be tested by Prime95, use SuperPI and 3dmark first.

Don't use unnecessary higher voltage, I set Vchipset = 1.5V, VLDT = 1.2 V to run HTT up to 320 - 360 MHz in my DFI NF4 Ultra-D.

Do not overclock the two 6800 GT. I would even suggest using only one 6800 GT first for finding out the CPU and memory limit first. I have not tried SLI on two 6800 GT myself, make sure there is no bios issues.

What version of bios do you use?

Only after knowing the above, you may begin to increase the CPU VDD higher to get higher CPU frequency, and Vdimm to get higher memory frequency and tigher timing (later).

Please answer the above questions. I need to know systematically what you get in order to help and make further suggestions (if any).
 
Last edited:
hitech,

bios 2.18
my cpu is CBBh, 0446
I ran 240 x10 setting (default settinf for everything else=auto)
2.5-3-3-8 1T
LDT x3
cpu idle at -70C
load at -66C
I ran superPI 8M - 6m 33s
32M- fail- not exact in round.
memtest- pass 3 pass
Should I continue with high clock speed/??
 
Last edited:
Bios 02/17/05 and bios 03/10/05 are good. The 02/18/05 should be OK.

If the SuperPI 32M is still failing after few runs, raise the voltage to 1.45 - 1.50 V to see whether it helps.

Can you also raise the temperature to less extreme, say around around 0 C to see what happens. At 0 C, it should be able to pass SuperPI 32M or Prime95 small FFT with 1.4 - 1.5 V at 2.4 GHz.

You have to get it to pass something such as SuperPI 32M, 3dmark 01/03 so that you know what is the baseline is. If not 2.4 GHz, then try 2.3 GHz (230 MHz x 10).

Raise the voltage to like 1.45 - 1.50 V to see whether CPU temperature increase, so you know the heat sink is making proper contact.

You said it was able to run at 2.9 GHz, what temperature was that? And was it stable? At what voltage?
 
Back