Overclocking Sandbox: Tbred B DLT3C 1700+ and Beyond

hitechjb1 · Apr 14, 2004

update:

This 939 platform memory bandwidth, as estimated from some test data (so result is preliminary), is impressive. Its efficiency is around 86-90%, which is 15-20% (to be confirmed with more 939 test data) better than the P4 QDR dual channel counterpart.

Its effective bandwidth (not max), running at the same memory bus speed, is about 15-20% higher than that of P4 QDR dual channel and 81-89% higher than that of 754 platform or nforce2 dual channel.

Estimation and importance of 939 platform memory bandwidth

A major difference between the AMD 754 and 939 platforms is the memory bus, i.e. 64-bit memory bus for 754 vs the 128-bit memory bus for 939. Here put it some estimate (since 939 is not commonly available yet) to see the potential impact on memory bandwidth performance.

I think there is a significant advantage from the 939 128-bit memory bus and on-chip dual channel controller, it is very different from the nforce2 dual channel which has only few % memory bandwidth improvement over single channel, as shown below.

memory_bandwidth_efficiency = effective_memory_bandwidth / max_memory_bandwidth

1. In the P4 arena, the dual channel QDR efficiency is around 75% with 64-bit memory bus
max_memory_bandwidth = FSB x 4 x 8 = 32 FSB
effective_memory_bandwidth = FSB x 4 x 8 MB/s x 0.75 ~ 24 FSB

2. XP nforce2 single channel efficiency ~ 85-90%
max_memory_bandwidth = FSB x 2 x 8 = 16 FSB
effective bandwidth = 0.875 x 2 x 8 x FSB ~ 14 FSB

3. XP nforce2 dual channel effieiency ~ 90 - 95% (actually should be 45-48%, depends on how it is counted)
max_memory_bandwidth = FSB x 2 x 8 x 2 = 32 FSB
max_FSB_bandwidth = FSB x 2 x 8 = 16 FSB (FSB limits dual channel memory bandwidth)
effective bandwidth = 0.925 x 2 x 8 x FSB ~ 14.8 FSB

4. 754 hardwares have been around for a while, and we have seen its memory bandwidth being around 95%.

For 754 platform, memory bandwidth efficiency ~ 95%
max_memory_bandwidth_754 = 2 x 8 x memory_bus_frequency = 16 memory_bus_frequency
effective bandwidth = 0.95 x 2 x 8 x memory_bus_frequency = 15.2 memory_bus_frequency

E.g. from Maxvla's system screenshot (http://www.maxvla.com/host/komusa4200b.jpg), a 754 memory benchmark (integer buffered iSSE2) shows the 754 memory efficiency being around 4574/4800 = 95%.
At 300 MHz, the max bandwidth would be 4800 MB/s for single channel, and 9600 MB/s for 128-bit bus (theoretical max).

5. For the 939 128-bit memory bus, there is a good possibility that it could be higher than 75% (the P4 QDR number) due to its direct 128-bit memory bus:
- max_memory_bandwidth_939 = 2 x 16 x memory_bus_frequency = 32 memory_bus_frequency
- At 80%, 300 MHz, the effective bandwidth would be 7680 MB/s
- At 90%, 300 MHz, the effective bandwdith would be 8640 MB/s
- At 95%, 300 MHz, the effective bandwdith would be 9120 MB/s
(ECC is not required in 939).

I think the 128-bit memory bus could be more efficient than the 64-bit QDR, hope it is close to the single channel number ~ 85-90%. This will be confirmed when actual 939 hardwares come out. (Will see)

- Ref result 1
I have seen number on memory bandwidth for a FX51 to a A64, assuming running same bus frequency. This is an example,
FX51 - 5315 MB/s (dual channel 128-bit bus)
A64 - 2954 MB/s (64-bit bus)

bandwidth_128_bus / bandwidth_64_bus = 5315 / 2954 = 1.8

So assuming the 64_bus has 95% efficiency, then the

128_bus_efficiency would be 95% * 1.8 / 2 = 86%.

- Ref result 2
http://www.ocworkbench.com/ocwbcgi/ultimatebb.cgi?ubb=get_topic;f=29;t=000711

A64-FX (939) DDR400 dual channel - 5763.5 MB/s
A64-FX (939) DDR400 dual channel disabled - 3101 MB/s
Not clear in the result about the memory bus speed, let's assume it is 200 MHz for the math.

The 939 dual channel 128-bit memory efficiency = 5763.5 / 6400 = 90% !!!

Improvement over dual channel disabled = 5763.5 / 3101 = 1.86 (or 86%) (impressive bandwidth improvement)

Summary (preliminary numbers, may vary as more 939 test results become available):

- If further confirmed by more 939 hardwares, this 86 - 90% number on bandwidth efficiency for 939 128-bit is 15 - 20% higher than the 75% QDR of P4 (64-bit).

- At 86-90% efficiency, the effective bandwidth for the 939 128-bit memory bus would be 81 - 89% higher than that of a 754 64-bit memory bus, with assumed 95% memory efficiency.

This higher bandwidth in 939 would have significant impact on memory intensive applications such as video and image streaming, applications using spatially structured data as in scientific computation, ..., as well as 3Dmark01.

PS:

For video, image streaming, data needs to be refreshed constantly from the main memory (L3) to the on chip L2 via the memory bus (same as FSB in P4 and XP) as size of video data >> L2 size at any given time. So the high P4 dual channel memory bandwidth delivers an advantage. For the upcoming 939, I think it would even be better due to its 128-bit memory bus (w/ dual channel controller).

Let BW stands for effective memory bandwidth (not max),
DC stands for dual channel memory controller,
SC stands for single channel memory controller,
for the same bus speed (FSB, memory bus)

BW_939 > BW_P4_DC > BW_754 > BW_XP_DC > BW_XP_SC

at a ratio estimated respectively about

86-90 : 75 : 48 : 47 : 44

or

BW_939 = 27.5 - 28.8 bus (to be confirmed when 939 available)
BW_P4_DC = 24 FSB
BW_754 = 15.2 bus
BW_XP_DC = 14.8 FSB
BW_XP_SC = 14 FSB

Multiply the corresponding number and FSB in MHz will give the MB/s memory bandwidth.
E.g. FSB = 200 MHz, mem_fsb_ratio 1:1, BW_P4_DC = 24 x 200 = 4800 MB/s

hitechjb1 · Apr 18, 2004

Differences between XP FSB and the A64 buses (separate memory bus and HyperTransport bus)

For XP, memory data, video card data, PCI data (hard disk, optical drives, networking, ...), serial links (USB, firewires, ...), slower peripheral (keyboard, mouse, ...), everything are going through the FSB to/from the CPU.

Using nominal 200 MHz FSB running in DDR, with 64-bit data path, the
max bandwidth is 200 x 8 x 2 = 3200 MB/s = 3.2 GB/s

The traffic that are crucial to system performance such as memory data, video card data, hard disk data (file I/O, paging) have to compete with other in the FSB, result in bottleneck and system bus conflct.

For the A64 CPU, the memory traffic and the traffic for the rest of the devices mentioned above are separated at the CPU rather than at the external chipset. This is the key difference in system bus architecture between the old XP and the new A64, and has an important advantage of system performance for the A64.

- Memory is communicating directly via a separate memory bus to the processor's on-chip north bridge/dual channel memory controller with 128-bit data path (for 939/940) and on-chip north bridge/single channel memory controller with 64-bit data path (for 754).
In an other post, the effective bandwidth for the 128-bit dual channel is estimated around 90% of max bandwidth, which is higher than the 75% number of P4 dual channel QDR.

- The rest of the subsystems such as video, hard drives (IDE, SATA, RAID), optical drives, networking, serial links, multi-CPU communication (for multi-processor board), ..., are comunicating to/from the CPU via the HyperTransport bus to the external chipset and various bridges down stream.

HyperTransport is for point to point connecting the CPU to peripheral subsystems such as networking, storage, serial links, chip to chip communication, I/O, ....

(HyperTransport bus does exist in nforce2 chipset, linking NB and SB.)

It is based on packet switching with a packet size of multiple of 32 bit (4 byte), with a max packet size of 64 bytes. HyperTransport allows for bi-directional transfer.

Data width can be configured in 2, 4, 8, 16, 32 bit.

Currently, its specification is 200-800 MHz with DDR, hence
max bit rate = 1600 Mb/s (per bit)

Since it is packet switching, the switching rate is usually referred to as number of transfer per sec (T/s). So the
maximum transfer rate = 1600 MT/s

At maximum 32-bit transfer, the
max bandwidth for 32-bit = 32 x 800 x 2 / 8 = 6400 MB/s = 6.4 GB/s
(peripheral bandwidth already higher than the XP FSB)

Since transfer is allowed for bi-direction, for 32-bit transfer, the
max throughput for 32-bit = 12.8 GB/s.

Compared this speed with 33 MHz/32-bit PCI which is 133 MB/s, it is 48X. Compared to the 1 GB/s for PCI-express, it is 12X.

The max bandwidth (for peripheral communication) of 6.4 GB/s is comparable to that offered by the current system bus (FSB) used for both memory, video and peripheral at 200 MHz, the max bandwidth for dual channel quad pump P4 is 6.4 GB/s, and DDR for AMD is 3.2 GB/s.

Summary:

Due to the separation of memory bus and HyperTransport (system) bus for all other devices in A64,
- the effective latency between the CPU (after L2 miss) and the memory (L3) is reduced
- the effective bandwidth of the A64 memory bus (128-bit in 939) to/from the CPU is alone higher than the effective P4 memory (and system) bandwidth, and twice that of XP
- the max bandwidth of the HyperTransport bus (for all other devices) to/from the CPU is alone comparable to P4 system bus, twice that of XP

The max combined bandwidth of memory bus (in 939) and HyperTransport in an A64 system is more than twice the sytem bus (FSB) of a P4 system and four times the system bus (FSB) of an XP system.

hitechjb1 · Apr 19, 2004

The L2 cache size from XP to A64 has grown from 256 KB to 1 MB, eventually will see 2 MB in A64.

Palomino, Tbred A, Tbred B - 256 KB
Barton - 512 KB
A64 - 512 KB, 1 MB, eventually 2 MB

Some remarks on cache latency, cache size, memory latecny and memory bandwidth (for A64's)

The tradeoff and equivalence of cache latency, cache size and memory bandwidth have been studied to great extent, .... Their average and statistical relationship and impact on performance have been welll analyzed, ....

- Cache latency is the time or number of cycles to wait for getting the first data from the cache after a read command is issued.

- Cache size, the size of cache in bytes (for data/instruction), relates the probability (hit ratio) that data can be found in a cache. The larger the cache, the higher the chance to find the data in the cache. If data is not in L2 (cache miss), then data has to be looked for in the next stage cache or main memory (aka L3). Memory can be treated as the next level cache (L3) for L2.

- Memory latency is the time or number of cycles (many more cycles compared to L2 latency) to wait for getting first data from the main memory after a read command is issued.

- Memory bandwidth is the number of bits per sec (usually in MB/s) transfer to/from the memory after a transfer has started (i.e. after the latency).

Studies have shown that twice the size of L2 would translate into few % (typically around 5%) of average system performance, over a wide range of prorgrams (some benefited more and some less). This conurs with what we have been talking about the XP w/ 256KB L2 and the 512KB L2 Barton.

Studies have also shown that there is equivalence between cache size and cache latency, i.e. using larger memory/cache to tradeoff memory/cache with larger latency.

939 has twice the max memory bandwidth of a 754 due to the 128-bit memory bus in 939. This is in additon to the dual channel memory controller in 939. For applications that require constantly changing, large, and well structured spatial data as in scientific computations, video encoding/decoding, image processing, ..., these applications would be benefited directly from the 128-bit memory bus of 939 (vs the 64-bit of 754), .... Since data needs to be refreshed constantly from the main memory (L3) to the on chip L2 via the memory bus as size of data >> L2 size at any given time.

Interesting question: how would the following A64 perform running the same CPU frequency, memory bus frequency and HT bus frequency
A - A64 1 MB L2, 128-bit memory bus (939/940)
B - A64 1 MB L2, 64-bit memory bus (754)
C - A64 512 KB L2, 128-bit memory bus (939)
D - A64 512 KB L2, 64-bit memory bus (754)

A is better than B or C.
B or C is better than D.
Between B and C, it depends on applications. For memory intensive applications, C has an advantage.
(My choice would be C (512 KB L2 939) at first to save money on CPU, then upgrade later to A (1MB L2 939) when CPU yield mature and price lowered.)

Recap on 939 memory latency, memory bus bandwidth and system bus bandwidth:

For the A64 CPU, the memory traffic and the traffic for the rest of the devices (video, IDE, SATA, serial links, ...) are separated at the CPU rather than at the chipset (NB). As a result,

- The average memory latency between the CPU (after L2 miss) and the memory (L3) is reduced.

- The effective bandwidth of the A64 memory bus (128-bit in 939) to/from the CPU is alone higher than the effective P4 memory (and system) bandwidth (estimated about 15-20% higher), and almost twice that of XP and also that of 754 (estimated 81-89% higher).
(See earlier post on memory bandwidth.)

- The max combined system bus bandwidth of memory bus (in 939) and HyperTransport in an A64 system is more than twice the sytem bus (FSB) of a P4 system and four times the system bus (FSB) of an XP system.
(See earlier post on system bus bandwidth.)

hitechjb1 · Apr 25, 2004

Cache and CPU performance

There are two processors A and B both running at 2.5 GHz, i.e. 2,500,000,000 clock cycles per sec. A basic CPU operation requires one clock cycle.

One processor A has a larger L2 cache, say 512 KB. Another processor B has a smaller L2 cache, say 256 KB.

L1 and L2 cache are for storing frequently used data for the CPU, temporarily until new data has to be swapped in from, and old data has to be swapped out to main memory. The processors can read from and write to the cache with very few clock cycles (cache latency).

Main memory (aka L3 in PC) can store much much more amount of data (e.g. 1 GB main memory would be 2000 times of 512 KB L2). To read/write the main memory, it requires much much more CPU cycle, say 30 - 80 times.

Hard drive (aka L4 in PC) can store even more data, ..., basically the universe of the data in your system, but it takes even more time, and it occurs during paging when data is not found in main memory in a computer system.

L1 cache, L2 cache, main memory (L3), hard disk (L4) form the so called memory hierarchy.

The larger the cache, the chance (probability) of finding data there is higher. Ananlysis shows that when the cache size is above certain size for a given CPU architecture, CPI and cache latency, the probability will level off. Typically, the probability is around 85 - 95% for L2 ranging from 256 KB to 512 KB or even 1 MB.

The time to read/write data to the main memory typically requires many many more CPU cycles (see earlier number). So if the CPU needs data that is not in the cache (called cache miss), it would have to wait until the data arrives in the cache again from the main memory (many more cycles later than if it is found in the cache).

Even if both CPU A and B are running at the same frequency of 2.5 GHz, CPU A will finish a given job sooner than CPU B since the probability for CPU A to find data in the cache is higher than that of CPU B. CPU A has less cache miss than CPU B.

Analysis has shown that, by doubling the L2 cache size, the overall performance would be improved by 0 - 10%+ over a wide range of applications, some more and some less, averaged typically by say 5%.

That is why we usually say a Barton (512 KB L2) performs 5% better than a 1700+ (256 KB L2) running at same frequency, or the 1700+ has to run 125 MHz faster to break even with a Barton at 2.5 GHz. Few months ago, a Tbred B DLT3C 1700+/1800+ overclock about 100 MHz better than a desktop Barton, so they were about tie. But recently the mobile Barton overclocks equally good, and in many time even higher than the 1700+/1800+, so the mobile Barton is a better choice for performance (apart from the price difference).

For A64 CPU, from the PR rating of the A64 754 CPU:

2800+ 1800 MHz L2 512 KB
3000+ 2000 MHz L2 512 KB
3200+ 2000 MHz L2 1 MB
3200+ 2200 MHz L2 512 KB
3400+ 2200 MHz L2 1 MB

we can see that going from 512 KB to 1 MB L2 size, the CPU rating stays the same while running 200 MHz slower at around 2 GHz level. In other words, the difference between 512 KB and 1 MB L2 is equivalent to about 10% CPU performance (rating).

What happens to programs running in CPU with smaller and bigger L2 cache (page 17)

Some remarks on cache latency, cache size, memory latecny and memory bandwidth (for A64's) (page 19)

hitechjb1 · Apr 25, 2004

Remarks on A64 and various platforms

For A64 platform, it is a new generation of
- CPU architecture (64-bit and associated features, on chip memory controller, larger L2 cache, ...)
- silicon technolgy (SOI and soon 90 nm)
- system technolgy (separate memory and HT system bus, and many new devices into the future, e.g. PCI-express)
- OS and software (64-bit OS and applications, the A64 works perfectly w/ x86-32 bit software)
- more features in chipset and motherboards (e.g. faster serial link, raid 0+1, faster network supports, ...)
....

- A64 has many new CPU architectural features, more raw power, higher stock CPU frequency, scalable to next generation 90 nm SOI silicon technology (overclockability)

- A64 platform replaces the single system bus (aka FSB) of XP by two SEPARATE buses, namely a memory bus and a HyperTransport HT system bus (connecting to all system devices via the north bridge).
As a result, the system bandwidith would be two to four times that of an XP system running the same bus frequency.
The HT bus can go as high as 800 MHz w/ DDR, 32-bit, a max bandwidth of 6.4 GB/s. The dual channel 128-bit memory bus (for 939) has a max bandwidth of 6.4 GB/s.

- The memory controller for A64 is on the CPU chip. As such and the separate memory bus and system bus, there would be less bus conflict and the effective memory latency would be reduced.

Different CPU and system platforms (940, 754, 939)

There are two main platforms, namely CPU's with 754 and 939 sockets (we can skip the third one 940 for now). The 754 and 939 literally refer to the pin count of the CPU/socket, but ther are implications on system cost, performance, scalability into the future, ....

- 754 can be considered for price/performance,
- 939 can be considered for higher end, its scalability and future compatibility of motherboard features and CPU (price and yield) into the future.
- Theoretically, a 939 always delivers better peformance than a 754, especially on memory bandwidth and memory intensive applications, if price-performance is not a major consideration factor.
- Both 754 and 939 have L2 cache size 512 KB or 1 MB, eventually to 2 MB (for 939).
- The total system bandwidth (memory + HT) is about two times that of XP system for 754, four times for 939.
- A major difference between 754 and 939 is the memory bus bandwidth. 128-bit dual channel memory bus for 939 and 64-bit for 754. Effective memory bandwidth of 939 is estimated to be 80+% higher than that of 754.
....

Chipset

The current chipsets are NFORCE3 150 and K8T800, but will soon be replaced by NFORCE3 250 GB and VIA K8T800 PRO.
The newer chipsets are much better for
- richer features (such as more raid channels and 0+1, faster networking, built in firewall, bios tweak, ...)
- higher HT bus bandwidth (both data width and frequency)
- more HT device supports
- stability (bios, driver bug fixes)
- Both the K8T800 Pro and Nforce3 250 GB have working PCI/AGP lock
- The NFORCE3 250 GB supports both 754 and 939
- K8T800 PRO supports 939, not sure about 754

IMO, the K8T800 Pro from VIA or the NFORCE3 250 GB is a better choice than the 150, and will soon be available (May - June 04, I think).

So the current/soon-to-be (in April 04) possibilities are, IMO:
- motherboard w/ Nforce3 250 GB + 754 CPU (price performance)
- motherboard w/ Nforce3 250 GB + 939 CPU (more money for CPU ?)
- motherboard w/ VIA K8T800 Pro + 939 CPU (more money for CPU ?)

As of today (early May 04), both the chipsets K8T800 Pro and 250 GB (and the associated motherboard) seem to be head to head competitive with each other, will need to see more results about how each performs based on individual motherboard implementation, features, performance, OC friendness, ....

DDR2 memory compatibility

DDR2 is a recent standard after DDR 400 (400/500). DDR2 memory module provide higher memory bandwidth (higher clock frequency) beyond DDR 500 but not necessary shorter latency. It also operates at lower voltage. DDR2 delivers equivalent DDR 533 and up (to 800 at least ?). Further it consumes less system power per module.

DDR2 and DDR memory modules are different in terms of module pin count, voltage, signal timing, signal termination, chip package, ..., as far as the memory module and CPU/memory controller interface are concerned. Definitely a new motherboard layout will be required, but this may not be sufficient for the DDR2 change over.

As such difference, the socket 939 pin layout may or may not be able to interface with the DDR2 memory module, given the constraint imposed by the layers of motherboard interconnect, signal to noise consideration, ...., and more engineering details that have to be addressed.

AFAIK, I don't think there is a yes or no answer for socket 939 and DDR2 compatibility yet (as of April 04). The DDR2 memory change over wont't happen until at least late 2005. Whether the above answer being yes or no, I think whoever wants a A64 platform could not wait that long until the DDR2 + socket issue is clear one way or the other. I think by the time if there is a DDR2 change over (18 months ?), the motherbarods will be more feature rich, higher HT bandwidth and better HT devices (such as PCI-e) and much more powerful 90 nm CPU, there would probably be another upgrade cycle.

Micron, Samsung, ... have begun manufacturing DDR2 memory modules (as of 2Q04).
This is an article gives an overview of DDR2 memory modules from Micron.
http://download.micron.com/pdf/pubs/designline/dl3Q03.pdf

Estimation and importance of 939 platform memory bandwidth (page 19)

Differences between the XP FSB and the A64 buses (separate memory bus and HyperTransport bus) (page 19)

Some remarks on cache latency, cache size, memory latecny and memory bandwidth (for A64's) (page 19)

Links to Nforce3 250 GB reviews and motherboards (page 19)

hitechjb1 · Apr 25, 2004

As of May 04, there are two main second generation chipsets for the A64 CPU's, namely Nforce 3 250 GB from nVidia and K8T800 Pro from VIA.

Both are competing head to head in performance (such as HT bus speed), features (RAID, networking, ...).
- Both have working PCI/AGP lock.
- Both supports 754 and 939 CPU's

Information and availability of motherboards based on these two chipsets for 754/939 are coming in daily, will attempt to keep infomation up-to-date and accurate, .... Will highlight key informations that are important for overclocking and performance optimization, not duplicated, non-essential, obvious informations, ....

Eventually (after few months), one or two motherboards may become more popular for overclockers than the rest, .... At this point, mainly purpose is for collecting and sharing information the new CPU's, chipsets, motherboards, and test results ....

In general, the reviews for nforce 3 250 GB are more favorable compared to the first generation Nforce 3 150 and K8T800 (w/o Pro).

Some links to these 2nd generation chipset and motherboard reveiws for the A64 platforms follows.

Main difference between Nforce3 250 GB and K8T800 Pro
This is still evolving as more info are coming daily, ...
- K8T800 Pro
.... delivers 1000 MHz HT bus w/ DDR (i.e. 2000 MT/s)
.... SATA w/ RAID 0/1/0+1/JBOD

- Nforce3 250 GB
.... 800 MHz HT bus w/ DDR (i.e. 1600 MT/s)
.... native supports for 1 Gb/s ethernet
.... native supports for SATA w/ RAID 0/1/0+1/JBOD
........ 2 native SATA ports + 2 ext SATA ports (total 4 SATA ports)
........ RAID can span over SATA + PATA (total 8 drives max)
.... chipset w/ built-in firewall

Links to Nforce3 250 GB, K8T800 Pro and motherboards

Nforce3 Chipset 250 GB

http://www.tweaktown.com/document.php?dType=review&dId=636

http://www.motherboards.org/articlesd/hardware-reviews/1380_1.html

http://www.techreport.com/reviews/2004q2/nforce3-250gb/index.x?pg=1

http://www.gamers-depot.com/hardware/motherboards/nf3/n250/001.htm

http://www.anandtech.com/chipsets/showdoc.html?i=2004
http://www.anandtech.com/chipsets/showdoc.html?i=2009

VIA K8T800 Pro Chipset

The K8T800 Pro is the second generation chipset for the A64 CPU's. It is confirmed that it has a working PCI/AGP lock.

http://www.via.com.tw/en/k8-series/k8t800pro.jsp

http://www.anandtech.com/chipsets/showdoc.html?i=2046&p=1

250 GB 754 Motherboards

EPoX 8KDA3+ (250 GB 754)
http://www.hexus.net/content/reviews/review.php?dXJsX3Jldmlld19JRD03NTc=

MSI K8N Neo (250 GB 754)
http://www.hardocp.com/article.html?art=NjA3
http://www.anandtech.com/mb/showdoc.html?i=2036

ASUS K8N-E Deluxe (250 GB 754)
http://usa.asus.com/prog/spec.asp?m=K8N-E Deluxe&langs=09

Chaintech ZNF3-250 ZENITH (250 754)
http://www.chaintechusa.com/tw/eng/product_spec.asp?MPSNo=13&PISNo=266
http://www.neoseeker.com/resourcelink.html?rlid=74533

250 GB 939 Motherboards

K8T800 Pro 754 Motherboards
(Since this is the first K8T800 Pro board that supports 754 CPU, so it looks like K8T800 Pro supports 754 and is motherboard dependent.)
ABIT KV8 Pro
http://www.abit-usa.com/products/mb/techspec.php?categories=1&model=176

K8T800 Pro 939 Motherboards

This list grows fast as more motherboards for 754 and also 939 soon.

Peformance Analysis of various A64 Platforms

Remarks on A64 and various platforms (page 19)

Estimation and importance of 939 platform memory bandwidth (page 19)

Differences between the XP FSB and the A64 buses (separate memory bus and HyperTransport bus) (page 19)

Some remarks on cache latency, cache size, memory latecny and memory bandwidth (for A64's) (page 19)

hitechjb1 · Apr 26, 2004

This is a summary of many posts on CPU voltage, frequency, temperature, stability, ..., for Tbred/Barton, but many concepts can be applied to CPU's and chips in general.

CPU voltage: from stock to max absolute, from efficient overclocking to diminishing return

1. For Tbred B/Barton, the default voltage ratings (stock voltage) are
- For mobile Barton, 1.45 V
- DLT3C 1.5 V
- DUT3C 1.6 V
- DKT3C 1.65 V
This is the default voltage rating AMD recommends to use.

2. The max absolute voltages that AMD put up are:
Quoted from AMD:
"The AMD Athlon XP processor model 8 should not be subjected to conditions exceeding the absolute ratings, as such conditions can adversely affect long-term reliability or result in functional damage."

- For DLT3C, e.g. 1700+ DLT3C
Vcc_core_dc_max = 1.5 + 0.05 = 1.55 V
The absolute rating for Vcore = 1.55 + 0.5 = 2.05 V

- For DUT3C, e.g. 1700+ DUT3C, 2100+
Vcc_core_dc_max = 1.6 + 0.05 = 1.65 V
The absolute rating for Vcore = 1.65 + 0.5 = 2.15 V

- For DKT3C and desktop Barton, e.g. 2500+, 3200+
Vcc_core_dc_max = 1.65 + 0.05 = 1.70 V
The absolute rating for Vcore = 1.70 + 0.5 = 2.20 V

For mobile Barton which the model 8 and 10 tech doc do not cover, but since mobile Barton is a derivative from Tbread B + Barton, so its max absolute voltage should resemble that of Tbred B 1700+ DLT3C.

Ref:
Max Vcore for Tbred B and Barton (page 5)
How much voltage can be applied to a CPU (page 5)

3. For overclocking, the "efficient overclocking voltage" that gives the most overclocking frequency and keeps temperature below diminishing return is
- between 1.5 to 1.85 V for DLT3C and mobile Barton,
- between 1.6 to 1.95 V for DKT3C and desktop Barton
getting about 100 - 130 MHz per 100 mV.
Ref:
General rules on voltage and temperature for CPU overclocking (page 16)

4. If one needs to get the last MHz (last stable 50-100 MHz) from the CPU, then the CPU has to operate above the "efficient overclocking voltage" and below the "max absolute voltage" . The CPU would have to operate in the diminishing return regime in which every mV of voltage added to speed up the CPU frequency would be counter-acted by the heat increase which in turn slow down the CPU. The return of MHz from voltage is small (< 30 MHz per 100 mV, < 10 MHz / C) and is costly in term of cooling, power supply in this operating range.

This voltage range is recommended for benchmark testing and competition, and not necessary for 24/7 usage. If one has only a CPU to rely on, don't operate it constantly in this voltage range.
Ref:
Some numbers to determine max CPU overclocking frequency - Vcore vs temperature,
When do the CPU's slow down? (page 13)
Explanation (page 13)

5. The effect of high voltage on CPU life expectancy is discussed in:

How to determine "highest" voltage and temperature for CPU overclocking (page 16)

Effect of high Vcore and electromigration on CPU failure time (page 15)

Effect of high Vcore and electromigration on expected failure time for Tbred B/Barton (page 15)

What could damage a chip/CPU permanently? (page 15)

What is gate break-down voltage (page 16)

Related links:

Relationship of clock, die temperature and voltage (update)
- What is the active power of a CPU at frequency f and voltage V
- How to estimate CPU static and active power
- Effect of die temperature on CPU clock frequency at a given Vcore
(page 13)

Vcore vs processor frequency and cycle time (page 19)

What is CPU stability (page 19)

Why frequency and voltage are important for overclocking performance (page 19)

How to read CPU temperature (page 19)

hrhrhrFOOT · May 9, 2004

Why oh why in gods name is this not a sticky.

hitechjb1 · May 15, 2004

On CPU life expectancy and tradeoff

There is NO clear Yes or No answer to predict "life expectancy" regarding to a particular CPU based on voltage and temperature. For a given CPU type, they follow certain statistical behavior.

The inverse relationship between frequency and temperature will naturally determine the max voltage and frequency for a given CPU and cooling setup. If overclocking is done properly, such voltage and temperature should be below the max absolute temperature and voltage of a given CPU specifcation (at least true for Tbred/Barton).

For example, assuming nominal voltage is 1.5 V.

Running it constantly at 1.8 V is 20% over nominal and running it at 1.95 V is 30% over nominal voltage.

From electromigration analysis, keeping temperature roughly constant (by cooling), overvoltage from 20% to 30% decreases CPU failure time (life expectancy) by about 10% (failure time reduced from 69% to 59%)

For Tbred B/Barton/Mobile Barton, the frequency gain between 1.8 V to 1.95 V is about 75-100 MHz at 2300-2500 MHz level, it is stating to operate in the diminishing return regime, getting only 75-100 MHz from 150 mV Vcore increase. The gain in overclocking frequency is about 3 - 4%.

The most effective, cost effective overclocking voltage for Tbred/Barton is between 1.5 - 1.9 V, beyond which overclocking is very costly in terms of power supply and cooling with diminishing frequency gain.

So it is a tradefoff between frequency, voltage and life expectancy.

Going from 1.8 V to 1.95 V, one would get 3 - 4% increase in overclocking frequency at the expense of an additional 10% reduction in CPU life expectancy (statisitically).

But then a different person uses a CPU differently and have different objective, some expect to use it for 6 months, 1 year, some for 3 year, ..., some want to squeeze the last MHz for competition and satifaction, so one has to make his/her own judgement and tradeoff, ...

This post discusses this subject in details:

CPU voltage: from stock to max absolute, from efficient overclocking to diminishing return (page 19)

Effect of high Vcore and electromigration on CPU failure time (page 15)
Effect of high Vcore and electromigration on expected failure time for Tbred B/Barton (page 15)
What could damage a chip/CPU permanently? (page 15)
What is gate break-down voltage (page 16)

hitechjb1 · May 15, 2004

What is an ideal and safe temperature for overclocking

I think there is NO single ideal or safe temperature in overclocking, temperature varies with CPU voltage and frequency. The AMD tech doc specifies some absolute temperaute limit, but that is way higher than normal usage temperature. Here is why.

With ambient temperature around 20 - 30 C (summer is coming), below is the general picture.

For stock cooling, one should be able to get 2.1-2.2 GHz out of it, temperature can get to somewhere between 55 - 65 C.

For Volcano 9/11/12, probably one can add 100 MHz to get it to 2.2 - 2.3 GHz, with temperature somewhere between 50 - 60 C.

With a good copper HS such as SLK-947/SP-97 and a high speed adjustable fan such as Thermaltake Smart Fan II, one would expect to get around 2.3-2.4 GHz from a desktop barton with temperature somewhere between 50 - 55 C; and 2.4 - 2.6 GHz out of a mobile barton with temperature somewhere between 40 - 50 C.

All these numbers are for general reference to illustrate the trend, don't take them as 100% rigid.

Generally, most people would put a fix number of max temperature, say 60 C or 55 C or 50 C on the CPU, it is fine as a first order guideline.

Technically, CPU temperature and CPU stable frequency vary inversely, higher frequency requires lower temperature for stability and lower frequency can work stably at a higher temperature.

E.g.
- A CPU can run stably at a much higher temperature (e.g. 60+ C), at a lower Vcore and lower frequency (e.g. 1.4 - 1.6 V, 2.2 - 2.3 GHz for Tbred B/Barton) than its intrinsic ideal max frequency.

- A CPU needs a much lower temperature (e.g. under 30 - 45 C on air or even lower for extreme cooling) to run stably at high Vcore for sustaining a higher overclocking frequency (e.g. 1.8 - 2.0+ V, 2.5 - 3.0+ GHz).

For technical details:

Originally posted by hitechjb1
...
The higher the voltage and frequency, the higher the power and the higher the temperature. Such active power will increase the CPU to certain temperature under certain load for a given cooling.

Since carrier mobility decreases as temperature increase beyond certain temperature due to lattice scattering, transistor switching slow down as temperature increases. So the frequency f of a CPU varies inversely with the temperature, or df / f = - k dt, mathematically, where f is frequency, t is temperature, and k is a constant.

The balancing of these two opposing actions, or the intersection of the voltage-frequency curve and the temperature-frequency curve of a CPU characteristic naturally determines the final stable voltage/frequency/temperature operating point. If overclocking is done properly, the maximal overclocking should settle naturally at certain frequency, voltage and temperature, as desribed above, below the maximum absolute rating of voltage and temperature (as seen from Tbred/Barton, ...). A perceived stable voltage and temperature setting may not be necessary after all, if the voltage, temperature, frequency variations are monitored properly and adjusted incrementally.

CPU voltage: from stock to max absolute, from efficient overclocking to diminishing return (page 19)

hitechjb1 · May 15, 2004

Why high voltage is needed to run higher CPU frequency (and maybe higher FSB)

Higher Vcore is necessary to get to higher frequency but not sufficient for stability. To get to stable high frequency under load, it requires both high Vcore and low enough temperature for a given CPU and for a given cooling setup.

This is why:

CPU (chip) is made up of many transistors (towards 100 millions for Tbred/Barton, 100+ million for A64) forming logic switches to perform logic operations. Physically, each transistor is connected to some capacitors which are inherent in the transistor gate dielectric and capacitive coupling between wire connections and the underlying silicon. In order for the transistors to switch and perform the required logic function physically in a given CPU cycle, electric current is needed to charge and discharge these capacitors (100 millions+) via the corresponding transistor switches.

Such switching current (usually known as Idsat) through a transistor depends on Vcore, the higher the Vcore, the larger the current (a property of transistor, without going into details here).

That is,
the higher the Vcore is,
the higher the current,
the shorter the time to switch a transistor to do a logic operation,
the shorter the cycle time of a pipeline in a CPU,
the higher the CPU frequency (and FSB).

These are the mathematics and physics of the above statement.
Idsat = k1 (Vcore - Vt)^n
where k1 and n are constants, n is between 1 and 2, Vt is transistor threshold voltage. In more detail, typically, n = 2, k1 = W u e / (2 L d), where W is width of transistor gate, L is transistor channel length, u is mobility, e is gate oxide dielectric constant, d is gate oxide thickness.

Since more current can charge or discharge a capacitor faster, the (delay) time (tD) to switch a transistor (in a logic gate) varies inversely with the current, so the higher the current, the shorter the time to perform a logic operation.

tD = C Vcore / Idsat
where C is capacitance (described above).

In a CPU pipeline, within a clock cycle, typically there are 5-20 stages of such logic switching. So the shorter the deley time (tD), the shorter the CPU cycle time T or the higher the CPU frequency f (since f = 1 / T).

f = 1 / T = k2 / tD
Combining this with the above equations, taking n = 2, we can get
f = k2 Idsat / C Vcore = k2 k1 (Vcore - Vt)^2 / C Vcore
or
f ~ k3 Vcore + k4
where k2, k3 = k1 k2, and k4 are some constants.

In other word, the higher the Vcore is, the higher is the frequency, and answers the original question.

For more details:
Vcore vs processor frequency and cycle time (page 19)
Why frequency and voltage are important for overclocking performance (page 19)

This is only part of the story, without other constraints. But the bad news is, we cannot keep increasing Vcore, as there are constraints of
- heat and temperature
- gate voltage breakdown of silicon oxide under the transistor gate
What is gate break-down voltage (page 16)

This keeps the CPU frequency from going forever by Vcore increase.

The temperature constraint is due to the active power P_active, plus leakage power P_leakage, dissipated when running a CPU at frequency f with voltage Vcore.

P_active = C Vcore^2 f
where C is the equivalent capacitance of the CPU to model power.

The temperature t of a CPU is related to the power P_active + P_leakage by

t = kR (P_active + P_leakage) + tA
where kR is known as thermal resistance of a given cooling, and tA is certain temperature offset.

The higher the voltage and frequency, the higher the power and the higher the temperature. Such active power will increase the CPU to certain temperature under certain load for a given cooling.

Since carrier mobility decreases as temperature increase beyond certain temperature due to lattice scattering, transistor switching slow down as temperature increases. So the frequency f of a CPU varies inversely with the temperature, or df / f = - k dt, mathematically, where f is frequency, t is temperature, and k is a constant.

The balancing of these two opposing actions, or the intersection of the voltage-frequency curve and the temperature-frequency curve of a CPU characteristic naturally determines the final stable voltage/frequency/temperature operating point. If overclocking is done properly, the maximal overclocking should settle naturally at certain frequency, voltage and temperature, as desribed above, below the maximum absolute rating of voltage and temperature (as seen from Tbred/Barton, ...). A perceived stable voltage and temperature setting may not be necessary after all, if the voltage, temperature, frequency variations are monitored properly and adjusted incrementally.

Vcore vs processor frequency and cycle time (page 19)

Why frequency and voltage are important for overclocking performance (page 19)

CPU voltage: from stock to max absolute, from efficient overclocking to diminishing return (page 19)

How does leakage current slow down future generations of chips (page 19)

Relationship between CPU frequency and temperature (page 20)

LkyOldSun · May 25, 2004

paint on heatsink will help heat transfer?
annodization of heatsink will help heat transfer?

got a guy HERE that claims so.

he also says that HS will work better if it is pitted (except of course where the core contacts). The last point seems as though it could be valid, but the first 2 sound plain wrong to me.

what do you think?

hitechjb1 · May 25, 2004

LkyOldSun said:
paint on heatsink will help heat transfer?
annodization of heatsink will help heat transfer?

got a guy HERE that claims so.

he also says that HS will work better if it is pitted (except of course where the core contacts). The last point seems as though it could be valid, but the first 2 sound plain wrong to me.

what do you think?

I read that article, the author made quite a number of claims. Some of them, I did not quite follow the arguments. The author did not give any quantitative argment and data to support what was said, ....

E.g. The author said any metal, copper, aluminum, plastic, graphite, carbon, ... can all be used for heat sink material, ..., used word like "very good", "work well", "excellent", ... but without quantifying them and making comparison, it is hard for me to understand what exactly was meant, ....

The author also said most heat sinks ("99% of the fancy heat sinks") are just for the look and are only a little better than the factory heat sinks (probably meant stock heat sinks) and that is due to heat sink size and higher CFM fans, ..., this point got lots of challenge, ....

From what I understand, the author said that with some "special" paint coating on copper heat sink, it would prevent oxidation of copper and hence preventing long term degradation of the copper heat sink due to the oxidation coating and trapped air in the oxide and since air is a heat insulator, .... The paint coating would not cover the CPU contact area, and the CPU core would still make direct contact with the copper heat sink since copper transfer heat very well.

This seems to make some sense to me, but it requires some test results and/or data to make the claims more convincing.

From looking at my Thermalright copper heat sink, after almost one year, I did not see any darkening of color due to oxidation. So are those copper heat sinks already have some kind of protective coating on them to prevent oxidation.

So the net is, I did not get any conclusiive and convincing agrument from that article, .... Also I do not know too much about the area of heat sink material and their chemistry, heat transfer, ....

hitechjb1 · May 28, 2004

As many of you know that the forums were upgraded to a new version of code, and many links in this thread (probably some other threads also) are no longer working.

E.g.
This does not work
http://www.ocforums.com/vb/showthread.php?s=&postid=2714461#post2714461

But I found out by deleting "/vb" in the link, it will work again.

This will work
http://www.ocforums.com/showthread.php?s=&postid=2714461#post2714461

So in case some links do not work for today (May 28, 2004), the reason is as above.

I am fixing all the links in this thread one by one now, so hopefully I should have all fixed within a day.

A easy get around is to delete "/vb" in the browser address line when coming across the missing link error.

hitechjb1 · May 30, 2004

Relationship between CPU frequency and temperature

CPU can be clocked faster by raising voltage about linearly with frequency, before entering into the regime of diminshing return in which heat begins to limit the linear increase. But it also requires low enough temperature. The balance between high voltage and low temperature will naturally determine the resulting stable frequency.

As a rule of thumb I use, for certain given voltage, every 10 C lower in temperature, the CPU can be clocked stably higher by about 4%. Mathematically, df/f = - 0.004 dT, where f is frequency and T is temperature. For more accurate modeling, that constant is not exactly 0.004/C, 0.004 is used just as a round number for simplicity.

There are voltage limitation in bios settings, power supply, voltage regulator, and also due to certain limit on transistor gate breakdown for a given gate oxide thickness. When voltage is below these constraints and is increased incrementally to supply the necessary power for higher frequency.

So say, if 2.6 GHz is the norm for mobile Barton on air at 45 C at certain voltage,
55 C would be 2.50 GHz
45 C would be 2.60 GHz
35 C would be 2.70 GHz
25 C would be 2.81 GHz
15 C would be 2.92 GHz
5 C would be 3.04 GHz
-5 C would be 3.16 GHz
-15 C would be 3.29 GHz
-25 C would be 3.42 GHz
-35 C would be 3.56 GHz
-45 C would be 3.70 GHz
-55 C would be 3.85 GHz
-65 C would be 4.00 GHz
....

The above table applies to some Tbred B such as DLT3C.

If a chip happens to be slower, say getting 2.5 GHz at 45 C (on air),
then subtract 100 MHz in each of the above numbers correspondingly.

I think many of the mobile Barton, when cooling down to 0 - 5 C during boot, one should be able to get at least a screen shot of 3 GHz.

From my estimate, at 5 C, many would do 2.9 - 3.0 GHz stably.

If a CPU can do x MHz stably, x + 100 MHz is a doable frequency for CPUID type of screen shots provided it responds to voltage.

I tested two mobile Barton on air to illustrate this rule.
Screens shot in next post.
http://www.ocforums.com/showpost.php?p=2806278&postcount=576

Voltage, temperature and frequency: the basic variables of overclocking (page 20)

hitechjb1 · May 30, 2004

I tested two mobile Barton on air, to illustrate the above rule of thumb.

Both can boot up to around 2.8 GHz on air for screen shots and benchmark at around 30-45 C. Prime95 stable frequency is 100-200 MHz lower, depending on the balance of temperature rise and CPU frequency drop.

A 2600+ IQYHA 0351 MPMW did
- 2.80 GHz with 1.98 V, 21/42.5 C,
- 2.71 GHz stable with 1.89 V, 22/46 C load, 20/39 C idle (Tornado 80mm at 3200 rpm)
- 2.65 GHz (221x12) stable with 1.82 V, 24/45 C load, 23/40 C idle (Tornado 80 mm at 4000 rpm)
- 2.65 GHz (221x12) stable with 1.82 V, 25/46 C load, 23/40 C idle (Tornado 80 mm at 3500 rpm)
- 2.50 GHz (227x11) stable with 1.66 V, 25/42 C load, 23/38 C idle (Tornado 80 mm at 3300 rpm)

A 2400+ AQYFA 0343 WPFW did
- 2.79 GHz with 2.22 V, 11/33 C,
- 2.60 GHz stable with 2.19 V, 17/43.5 C load.

2600+ IQYHA 0351 MPMW

barton_2600_iqyha_2790_1.97V_pcmark03_80.JPG

2400+ AQYFA 0343 WPFW

RJARRRPCGP · Jun 12, 2004

L337 M33P said:
I will generate a comprehensive set of results along with Sandra benchies, maybe in Excel spreadsheet format as the vB post system doesn't like tables.

I should be able to calculate the C/W with some data points, but that depends on the reliability of several factors including Vcore reporting accuracy, the MHz of the processor, the temperature sensing accuracy and choosing an equilibrium point at which the temperature difference is constant.

Since I cannot undervolt the CPU on my Asus A7N8X, I can only provide data from overclocked speeds and higher voltages.

For stability testing I plan to have two tiers of stability tolerance - 30 minutes of Prime95 for quick results and 2 hours or more for both comparison and more accurate guide to the capabilities of my CPU.

Once I get better cooling (Thermalright SK-7 + SFII) I will be able to keep the temperature more or less constant with the variable speed fan so I can compare results for stability at say 50C and 60C.

The links are invalid, because of the "/vb" in the links. I ended up getting error 404 Not Found.

hitechjb1 · Jul 5, 2004

Voltage, temperature and frequency: the basic variables of overclcoking

Voltage, temperature and frequency are the three basic variables for overclocking. These posts describe the relationship between them, using Tbred B/Barton as examples. The underlying concepts can be applied to other silicon type of CPU's and chips.

The effect of voltages on frequency and failure time are also discussed.

CPU voltage: from stock to max absolute, from efficient overclocking to diminishing return (page 19)

On CPU life expectancy and the tradeoff with voltage and frequency (page 19)

What is an ideal and safe temperature for overclocking (page 19)

Why high voltage is needed to run higher CPU frequency (and maybe higher FSB) (page 20)

Relationship between CPU frequency and temperature (page 20)

Captain Newbie · Aug 5, 2004

On Coating a Heatsink For Better Thermal Transfer/Longer Lifespan

If I read what the guy was trying to do correctly, he was in essence adding another layer betwixt heatsink and cooling air pressure. I would not reccomend this, even if the heatsink was not treaded against oxidation, simply because a component that is "au natural" (without paint) should shed heat quicker than a component with another layer on. I would imagine that copper heatsinks are already anodically treated in some method or another since copper does tend to oxidate, to preserve the copper.

The potential for air bubbles in TIM is a credible problem--with an incorrectly fitted heatsink, it may result in processor damage in the short-to-middle term. However, since almost all of us are competent to fit a heatsink properly, I do not believe it is a significant issue. As for oxidation, I have a copper Thermalright and after 8 months there is no sign of a problem. (I will monitor it and see how it holds up for another 8)

I don't think there will be any significant gain out of treating the heatsink with some other chemical or process. Without a way (or the financial means at present) to quantitatively test this, don't take this as gospel.

greenman100 · Aug 5, 2004

copper is not relative active electrically, alumnium is

treating a heatsink with anything but silver won't help

copper is second only to silver in thermal conductivity.

Overclocking Sandbox: Tbred B DLT3C 1700+ and Beyond

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Member

Senior Member

Senior Member

Senior Member

Member

Senior Member

Senior Member

Senior Member

Senior Member

Member

Senior Member

Senior Django-loving Member

Disabled

Similar threads