• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

[Ret Sticky]Overclocking sndbx for A64 939 systems with Winchester, Opteron dual core

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.
Cool and Quiet with new higher overclocking (continued)

The AMD processor driver (amdk8.sys 1.1.0.18) was installed.
http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_182_871_9706,00.html
Before this the system was running the native Windows XP processor driver (07/2002).
The version of the processor driver installed in a system can be found in the Windows device manager.


04/30/2005
Updated DFI LP UT NF4 Ultra-D to Bios 414-3 (for TCCD).

Major improvement:
- Dual channel memory modules can post without error and operate in DIMM 1 and 3 (yellow). Previous bios 310 and earlier posted with error when modules are in DIMM 2 and 4 (orange) (hitting F1 can bypass error and boot). I think bios 316 or after fixed that.
- Cool and Quiet (CnQ) can now operate stably up to the same high frequencies of CPU, HTT and memory attained when CnQ was not enabled. :)
- CnQ HTT and memory were improved from 293 to 317 MHz (at least), CnQ CPU from 2.64 GHz to 2.85 GHz (at least) :)

Tested:
HTT to 317 MHz
CPU to 317 MHz x 9 = 2.85 GHz
memory to 317 MHz 2.5-4-4-7 1T
Have not tested between 317 - 328 MHz (or 2.85 GHz and 2.95 GHz CPU) which is semi-stable without CnQ.

DFI LP UT NF4 Ultra-D
Winchester 3000+
G. Skill TCCD 4400 LE
CPU fan: 90 mm Enermax UC-9FAB adjustable fan (1200 - 2500 rpm)
chipset fan (from motherboard stock chipset fan)
system fan: low rpm fan (~2500 rpm)

Highest overclocking (with Cool and Quiet disabled)
2.95 GHz (328 MHz x 9) 1.60 V 3.0-5-5-10 1T Windows XP boot, Sandra CPU
2.90 GHz (322 MHz x 9) 1.60 V 3.0-5-5-10 1T SuperPI 1M stable
2.85 GHz (317 MHz x 9) 1.55 V 2.5-4-3-7 1T SuperPI 32M, 3dmark 01/03/05 stable
2.73 GHz (303 MHz x 9) 1.50 V 2.5-3-3-7 1T Prime95 stable


Using Cool and Quiet (CnQ), highest overclocking tested = 2.85 GHz (317 MHz x 9)

........................ Idle/Light load ... Load
CPU frequency .......... 1.59 GHz .......... 2.85 GHz
........................ 317 MHz x 5 ....... 317 MHz x 9
CPU VDD ................ 1.20 V ............ 1.54 V
Memory ................. 2.5-4-4-7 1T ...... 2.5-4-4-7 1T

Ambient room temperature = 17 C
CPU temperature ........ 27 - 28 C ......... 38 - 40 C
System temperature ..... 29 - 30 C ......... 30 - 31 C
Chipset temperature .... 33 – 34 C ......... 33 – 34 C
CPU fan ................ 0 RPM ............. ~ 2300 RPM (faster fan setting)
System fan ............. 2900 – 3000 RPM ... 2900 – 3000 RPM
Chipset fan ............ 1900 – 2000 RPM ... 1900 – 2000 RPM


Between idle/light load to full load,
voltage swing (from 1.1 V to 1.40 V) * 1.10 = 1.40 / 1.1 = 1.27
frequency swing (HTT x 5 to HTT x 9) = 9 / 5 = 1.8
current swing between idle and full load = (1.40 / 1.1) * (9 / 5) = 2.29

CPU VDD is switching between 1.2 V to 1.54 V, CPU frequency from 317x5 to 317x9 MHz, a difference of 80%.
Idle/light load CPU and system temperature drop about 5 C as VDD is decreased by about 0.34 V using CnQ.


Cool and Quiet bios setting

- LDT multiplier ............ "AUTO" (auto now works using Bios 414-3, previously need to be x3)
- K8 Cool&Quiet Support ..... "AUTO"
- Cool&Quiet MAX FID ........ "AUTO" or 9 (9 for 3000+, 10 for 3200+, etc)
- CPU VID Startup Value ..... "StartUp"
- CPU VID Control ........... "AUTO"
- CPU VID Special Control ... "Above VID * 110%" (or "Above VID * 104%", etc)


System was idle, CPU at 317 x 5 = 1.59 GHz, 1.2 V, CPU fan NOT spinning

lp_ultra-d_winnie3000_cbbhd_317x9_mem_317_2.5-4-4-7_2.8V_CnQ_tvfchart_superpi_8.JPG



System was running SuperPI, CPU at 317 x 9 = 2.85 GHz, 1.54 V, CPU fan at ~ 2300 RPM

lp_ultra-d_winnie3000_cbbhd_317x9_mem_317_2.5-4-4-7_2.8V_CnQ_tvfchart_superpi_9.JPG



System finished SuperPI, CPU back to 317 x 5 = 1.59 GHz, 1.2 V, CPU fan NOT spinning

lp_ultra-d_winnie3000_cbbhd_317x9_mem_317_2.5-4-4-7_2.8V_CnQ_tvfchart_superpi_10.JPG



Cool and Quiet (CnQ) Testing
Running the computer Cool and Quiet
 
Last edited:
I love you, hitechjb1. You always have informative stuff to show me. :)
 
DRAM Setting for BIOS 414-3 (TCCD) with or without Cool and Quiet

DFI LP UT NF4 Ultra-D
Bios 414-3 (TCCD)
Winchester 3000+

HTT = 315 MHz
Memory = 315 MHz 2.5-4-3-7 1T 2.8 V
CPU = 315 x 5 MHz - 315 MHz x 9 (1.58 GHz - 2.83 GHz)
VDD = (1.1 - 1.4 V) * 110%

With CnQ enabled, the screen shot shows the CPU frequency and voltage dropped back to 315 MHz x 5 and 1.2 V, CPU fan to 0 RPM, after SuperPI was completed. When SuperPI was running, CPU was at 315 MHz x 9 (2.84 GHz) and 1.54 V.

lp_ultra-d_winnie3000_cbbhd_315x5_mem_315_2.5-4-3-7_bios_414-3_dram_default_CnQ.JPG



When Cool and Quiet Disabled, same DRAM setting is used.
HTT = 315 MHz
Memory = 315 MHz 2.5-4-3-7 1T 2.8 V
CPU = 315 MHz x 9
VDD = 1.4 V * 110%

lp_ultra-d_winnie3000_cbbhd_315x5_mem_315_2.5-4-3-7_bios_414-3_dram_default.JPG



For A64, including 754/939, ClawHammer, NewCastle, Winchester, Venice, San Diego, ...,
best way to find out the DRAM setting for a particular system is to set the memory_HTT_ratio, command rate, tCL, tRCD, tRP, tRAS manually and the other DRAM settings to AUTO or the default optimal setting from the Bios. Then use the A64 Tweaker to fine tune the other DRAM settings if needed.

Use other system's setting for reference only, direct copying the setting may not work or may not be optimal even for systems using the same parts due to statistical variation of parts at high overclock.

Usually, for tCL-tRCD-tRP-tRAS-CmdRate
- BH5/UTT, 2-2-2-5/6-1T to 250 - 260 MHz, 3.3 - 3.5 V (if chosen to use such a high voltage)
- TCCD, 2.5-3/4-3/4-6/7/8-1T to 280 - 310 MHz, 2.7 - 2.9 V


For memory_HTT_ratio and memory divider, refer to this link
Overclocking setting for various bus frequencies (with a memory divider table)
 
Last edited:
Got same taste

This is some good stuff, however your sys is almost identical to mine, however, being the power maniac i
am i want more speed, specially out my vid card and CPU
I just put it together so im formulating how much i want to push it
_______________________________________________
[size=-2]
A64 3200 Winchester core sock 939
DFI LP nF4 SLI-DR mobo
GeForce 6600GT PCI-E x16
520W OCZ ModStream PSU
1 GB OCZ platinum dual channel non-OC'd timings 2-2-2-5
250GB SATA WD HD
Thermalright XP-90 heatsink
2 90mm fans & 2 120 mm blue LED Thermaltake fans
[/size]
 
g0dM@n said:
How's the 5x doing so good on SuperPI?

With CnQ enabled, the screen shot shows the CPU frequency and voltage dropped back to 315 MHz x 5 and 1.2 V, CPU fan to 0 RPM, after SuperPI was completed. When SuperPI was running, CPU was at 315 MHz x 9 (2.84 GHz) and 1.54 V.
 
OOOOOOH! CnQ controls the multi. I did not know that. I'm sorry.
 
Comparing various CPU cores based on SuperPI 32M run time

Here shows the relative performance of various CPU cores from Pentium to A64 to AXP normalized to
- CPU frequency
- CPU and memory frequency (about equal weight on the frequencies)
based on SuperPI 32M which tests CPU, cache, memory.

Performance is measured by 1 / SuperPI_32M_run_time

Score1 = Performance / CPU_freq
Score2 = Performance / (CPU_freq + 10 x memory_freq) ..... CPU and memory would have about equal weight

It shows the amount of computing power per MHz of CPU and memory.
E.g. some cores can deliver higher frequency, but the performance per MHz is much lower.

SuperPI_32M_vs_CPU.JPG


SuperPI_32M_perf_per_clock_vs_CPU.JPG


Watch how the yellow bar trending downwards in the last graph as performance clock for clock decreases.


Next the same data sorted by CPU type is shown.

As can be observed, for the same CPU type, the frequency-normalized performance measured centers around a certain value regardless of the frequencies of the CPU and memory.

SuperPI 32M Performance per CPU frequency of different CPU cores

SuperPI_32M_perf_per_clock_avg.JPG


The 3rd column shows relative measure to Barton (100%) based on SuperPI 32M on CPU frequency.

It becomes apparent that clock for clock (would like to see more data points to further refine the numbers),
- San Diego performs better than ClawHammer (~4%)
- Venice performs better than Winchester (~4%)
- Venice performs better than NewCastle (~5%)
- ClawHammer performs better than NewCastle (~2%)
- various A64 perform better than Barton/Tbred B (13-24%)
- Barton performs better than Tbred B (~4+%)
etc.

Remark: Sample size may not be large enough and tests not controlled.
SuperPI 32M run time analysis has not been adjusted for variations such as OS, memory size, timing.


SuperPI_32M_vs_CPUtype.JPG



Data is based on the SuperPI 32M results from members:
SuperPI 32M for testing CPU and system speed and stability


Appendix

Since performance of a processor (CPU, GPU, ...) is usually measured as instructions per second or integer/floating point/pixel/graphic operations per unit time, i.e. the shorter the run time, the higher the performance. So

performance = instructions per unit time = K / run_time

where K is a constant.

Since we are interested in ranking the different types of processors (e.g. San Diego, NewCastle, Barton, Prescott) according to how much instructions per unit time regardless of how fast they are clocked, so the above performance measure is further divided (normalized) by the clock frequency.

normalized_performance = K / (run_time * frequency)

So for the same processor, whether it runs at 2 GHz or 3 GHz, the frequency-normalized performance would be about the same under the same other conditions, although in absolute terms, the 3 GHz one will complete the computation in about 50% less time than the 2 GHz one.

This frequency-normalized performance measures the relative amount of instructions (or computation) a processor performs per given amount of clock cycles, regardless of clock frequency, regardless of cooling (air/water/sub-zero). It reflects what we commonly want to find out how the different types of processors perform clock for clock.
 
Last edited:
Computing/estimating power for 90 nm CPU

Many programs and web-sites use the following formula for estimating CPU power

Power = k V^2 f

where V is voltage, f is frequency and k is a constant. The power estimated can be off by a lot (say 20 - 30% I estimated) for a given 90 nm CPU. For 130 nm CPU, that method of power estimation is marginal (say 5 - 10% margin of error), and the margin of error is even less for 180 nm chips.

The reasons are
- Total power is not just a function of V^2 f, but rather

Power = A + B V^2 + C V^2 f

where A, B and C are some constants, to account for
(1) the standby power at low power state,
(2) the leakage power (larger %-wise for 90 nm) which does not depend on frequency, and
(3) the active power which is a function of voltage and clock frequency.

Here V and f can represent different voltage and frequency for core voltage and frequency, LDT voltage and frequency, memory controller voltage and freqency, ..., and the total power is the sum of them.

- The rated power (TDP, e.g. 67 W for Venice, 89 W for SD) from AMD spec is a "blanket or architectural" power number (IMO) at certain specified surrounding temperature (Tcase) and current (Iddmax), and does not account for individual CPU variation. In 90 nm technology, individual CPU parameters such as threshold voltage and process parameters can vary a large extend (say 20%, estimate used for discussion only).

- Total power of a CPU depends on the power and voltage of I/O, LDT I/O, memory controller, PLL, and the main core itself. So the total power would not just scale with CPU core voltage VDD and frequency f, but rather also V_LDT, V_memref, V_IO, ....

If the standby power, leakage power, power of I/O, LDT, memory controller, ... are taken into account, the power estimated would be less than that of the above estimate based on k V^2 f at the same voltage, frequency and load conditions.

From the AMD tech doc that I see, there may not enough numbers (breakdown) to enable calculating/estimating a CPU power under different voltages and frequency accurately.

One may resort to measure CPU power under different voltages, frequency and load conditions experimentally (details outlined in the next post):

- Measureing the current feeding into the CPU regulator, but it is still needed to account for regulator efficiency under different load, voltages and frequency.

- If there is some ways to calibrate the thermal resistance (C/W) of the heat sink and cooling, power can be measured by

power = (temperature - temperature_surrounding) / thermal_resistance

- Temperature measured no matter how accurate is only the average die temperature. But depending on how good the design of a CPU, there may be hot spot(s) on the die under load. The hot spot temperature can be way higher than the average temperature measured. And if that hot spot is located at the timing critical circuits (only a few transistors out of 100 millions), the CPU may become unstable under load regardless of the lower average temperature measured.


Further discussion:
http://www.ocforums.com/showthread.php?p=3734677#post3734677
 
Last edited:
I have not tried this, so just some outline of thoughts.

CPU power measurement

If one can get a current clamp like this (as an example)
http://www.obd2.com/accessories/data/currentclamp.htm

Use minimum hardware as possible. Use an old AGP/PCI low power video card (which does not use 12V or very minimally).

Clamp all the 12V lines that go to the motherboard, excluding HD's, optical drives, fans, video card 12V connection, ... with the current clamp.
But still, there may be still some residue 12V current besides the CPU (but not much and can be ignored).

1. Measure the current at idle (light load) - I_idle

2. Measure the current at full load - I_load

I don't know of a way just to isolate the CPU 12V current. If there is a way to do that, one could estimate the idle CPU power accurately. If an older PCI/AGP video card is used, 12V power is minimum, and the idle CPU power can be estimated

power_idle = VDD * I_idle * regulator_efficiency

power_load = VDD * I_load * regulator_efficiency

CPU power between light load and full load:

dPower = VDD * (I_load - I_idle) * regulator_efficiency

regulator_efficiency is still an unknown, typically 80% (assumed).

I think this would be very interesting to try.

If the power between light load and full load is measured, it can be used to calibrate the thermal resistance of a cooling setup.

thermal_resistance = (temp_load - temp_idle) / dPower


Using temperature to estimate power

Since the IHS thermal resistance is unknown, it varies due to thermal compound and how good the IHS was capped on.

If IHS is removed, thermal resistance can be calibrated. Typically between 0.15 - 0.2 C/W for water and high end air?

When power, temperature at idle and under load are measured, one can estimate the thermal resistance of the cooling, and can possibly determine whether the IHS cap was put on way below the norm (by comparing the nominal thermal resistance of a given cooling).

E.g. based on a Wincester 3000+ at 2.8 GHz, 1.5 V

power_idle = (temp_idle - temp_ambient) / thermal_resistance
power_load = (temp_load - temp_ambient) / thermal_resistance

thermal_resistance = 0.2 C/W

temp_idle = 30 C
temp_ambient = 24 C
power_idle = (30 - 24) / 0.2 = 30 W

temp_load = 41 C
temp_ambient = 24 C
power_load = (41 - 24) / 0.2 = 85 W
 
Last edited:
hitechjb1 said:
Computing/estimating power for 90 nm CPU
Power = A + B V^2 + C V^2 f

where A, B and C are some constants, to account for
(1) the standby power at low power state,
(2) the leakage power (larger %-wise for 90 nm) which does not depend on frequency, and
(3) the active power which is a function of voltage and clock frequency.

Here V and f can represent different voltage and frequency for core voltage and frequency, LDT voltage and frequency, memory controller voltage and freqency, ..., and the total power is the sum of them.

Doesn't this say that the 90nm cpus require power when the machine is off? Obviously I missed something hitechjb1.
 
Deeppow,
Thanks for bringing this up. I expected a careful reader would have this question, let me elaborate what it means.

As there are multiple voltages applied to the CPU, namely, VDD, VLDT, VTT, VDDIO, VDDA, MEMVREF (according to the AMD tech doc),
where
VLDT is voltage for HT I/O ring,
VTT is regulator voltage for side A and side B of the die,
VDDIO is voltage for DDR SDRAM I/O ring,
VDDA is voltage for PLL (phase locked loop),
MEMVREF is DRAM Interface Voltage Reference,

so in general, the total power is the sum of these voltages applied to the respectively equivalent resistance and capacitance (both are non-linear in general also).

Power =
VDD^2 / R_cpu + C_cpu VDD^2 f_cpu +
VLDT^2 / R_ldt + C_ldt VLDT^2 f_ht +
VTT^2 / R2_cpu + C2_cpu VTT^2 f_cpu +
VDDIO^2 / R_io + C_io VDDIO^2 f_mem +
VDDA^2 / R_pll + C_pll VDDA^2 f_htt +
MEMVREF^2 / R_memref + C_memref MEMVREF^2 f_mem

So to illustrate the concept that power is a function of V^2 / R + C V^2 f, i.e. the sum of leakage power and the active power, without loss of generality, it is assumed that only one voltage (say VDD) is changing, the power components of the other voltages are lumped into a single term, denoted by A which is the sum of power for VLDT, VTT, VDDIO, VDDA, MEMVERF.

So for varying V = VDD,

power = A + B V^2 + C V^2 f

for some values of A, B and C.
======================================================

Add-on for more complication (may skip this part):

For a single voltage applying to an equivalent resistor R and equivalent capacitance C, the terms B and C representing respectively the equivalent conductance (1/R) and the equivalent capacitance C are actually not constant through out the entire voltage range, they can be approximated by piecewise functions such as

When V < V1
P = G0 V^2 + C0 V^2 f

when V >= V1
P = P1 + G1 (V - V1)^2 + C1 (V - V1)^2 f

where the power at voltage V1 is P1 (and P1 = G0 V1^2 + C0 V1^2 f)

After expanding (for V >= V1)

P = a0 + a1 V + a2 V^2 + a3 f + a4 V f + a5 V^2 f

So in sum, those constants A, B, C, a0, a1, a2, a3, a4, a5, ... are purely for mathematical completeness to show that in order to account for multiple voltages and non-linearity at some operating voltage points, in general, power is quadratic in voltage and a linear in frequency.
 
here is what I have .

ok I don't know how to get all the stuff onto one screen shot :bang head so I did it like this . hope it is ok.
everest22mn.jpg

everest15lo.jpg

snadra15cx.jpg

sandra27tg.jpg

got some sweet scores I think.
 
hitechjb1 said:
Deeppow,
Thanks for bringing this up. I expected a careful reader would have this question, let me elaborate what it means.

....

So to illustrate the concept that power is a function of V^2 / R + C V^2 f, i.e. the sum of leakage power and the active power, without loss of generality, it is assumed that only one voltage (say VDD) is changing, the power components of the other voltages are lumped into a single term, denoted by A which is the sum of power for VLDT, VTT, VDDIO, VDDA, MEMVERF.

....

hitechjb1, thanks for clarifying! :)

Unfortunately you can't write it like that if you wish the math to be understood, it misleads and causes confusion. "A" needs to reflect the fact it does depend on (is a function of) other voltages. For example it might be written like

A(Vi = constants) where Vi = {VLDT, VTT, VDDIO, VDDA, MEMVERF, etc.}. You're trying to tell the reader that A is a function of other Vi. There are other ways too of course.

The variable "A" truly isn't a constant since if one of the Vi changes then "A" changes.

Keep up the excellent work! :)
 
As pointed out previously, the total power of an A64 can be approximated in terms of various voltages (V's), frequencies (f's), equivalent resistances (R's) and equivalent capacitances (C's). Resistive components and V's account for leakage power/current, capacitive components and V's and f's account for active switching power/current.

As there are multiple voltages applied to the CPU, namely, VDD, VLDT, VTT, VDDIO, VDDA, MEMVREF (according to the AMD tech doc),
where
VLDT is voltage for HT I/O ring,
VTT is regulator voltage for side A and side B of the die,
VDDIO is voltage for DDR SDRAM I/O ring,
VDDA is voltage for PLL (phase locked loop),
MEMVREF is DRAM Interface Voltage Reference,

so in general, the total power is the sum of these voltages applied to the respectively equivalent resistances (R's) and capacitances (C's) (both are non-linear in general also).

Power =
VDD^2 / R_cpu + C_cpu VDD^2 f_cpu +
VLDT^2 / R_ldt + C_ldt VLDT^2 f_ht +
VTT^2 / R2_cpu + C2_cpu VTT^2 f_cpu +
VDDIO^2 / R_io + C_io VDDIO^2 f_mem +
VDDA^2 / R_pll + C_pll VDDA^2 f_htt +
MEMVREF^2 / R_memref + C_memref MEMVREF^2 f_mem

Or writting in another way, let
f(V, f, R, C) = V^2 / R + C V^2 f

Power = f(VDD, f_cpu, R_cpu, C_cpu) + f(VLDT, f_ht, R_ldt, C_ldt) + f(VTT, f_cpu, R2_cpu, C2_cpu) +
f(VDDIO, f_mem, R_io, C_io) + f(VDDA, f_htt, R_pll, C_pll) + f(MEMVREF, f_mem, R_memref, C_memref)



Relationship of clock, die temperature and voltage (update)
- What is the active power of a CPU at frequency f and voltage V
- How to estimate CPU static and active power
- Effect of die temperature on CPU clock frequency at a given Vcore
(page 13)

How does leakage current slow down future generations of chips (page 19)

MOS scaling, voltage, power and leakage current

Some links about latest silicon technology, Silicon on Insulator (SOI), Strained Silicon (SS), Dual Stress Liner (DSL)


Why frequency and voltage are important for overclocking performance (page 19)

CPU voltage: from stock to max absolute, from efficient overclocking to diminishing return (page 19)


How to identify the physical core of an A64 (post 86)
 
As this 939 system is an actual system solely for daily 24/7 usage, not a system for benchmarking with minimal programs and utility installed. Some background utilities may slow down benchmarking to some extend.

Recently, an unnecessary background utility was disabled (a TV scheduler) and finally, the Winchester 3000+ CBBHD 0447 is getting below 27 min in SuperPI 32M, with CnQ disabled or enabled, and a 90 mm fan running 1000 - 2800 rpm.

CPU: Winchester 3000+ CBBHD 0447
memory: G. Skill 4400 LE 2 x 256 MB (Samsung TCCD)
motherboard: DFI LanParty UT NForce4 Ultra-D (rev. A02, bios 414-3)
cooling: XP-90, 90 mm Enermax fan (< 3000 rpm)
OS: Windows XP Professional SP2

- CPU: 2.80 GHz = 311 MHz x 9, 1.54 V (1.52 V works also)
- memory: 311 MHz, 2.5-4-4-8 1T, 2.8 V
- SuperPI 32M completed in 26 min 55.312 sec

lp_ultra-d_winnie3000_cbbhd_311x9_mem_311_2.5-4-4-8_CnQ_superpi32M_26min55sec.JPG


With CnQ enabled (idle x5, under load x9)
- CPU: 2.77 GHz = 308 MHz x 9, 1.17/1.49 V
- memory: 308 MHz, 2.5-4-3-7 1T, 2.8 V
- SuperPI 32M completed in 26 min 56.456 min

lp_ultra-d_winnie3000_cbbhd_308x9_mem_308_2.5-4-3-7_CnQ_superpi32M_26min56sec.JPG
 
Last edited:
This thread makes me feel like I know nothing about a computer LMAO..Awsom thread ! !
 
Back