Ram Timings Guide

Cluster · Dec 16, 2008

Ram Timing Guide (Last Updated: 12/20/08)

I've spent the last couple of weeks messing around with my ram, did a lot of reading, and learned a lot of interesting things while i was at it. I remember seeing someone say there wasn't a decent memory guide on the boards, so i decided to write this in the hopes that maybe it would help somebody. What I'm detailing here is a brief description of memory overclocking, bios options, and the procedure i go through to learn what my ram is capable of. I'll start first with a simple rundown of the timings, without getting into too excessive of technical detail, maybe in another guide.

If you wish to learn more about how memory actually works, one of the best articles on this subject that is easy to understand, although it was written for programmers, gives a really good description of how modern day ram operates, and why ram timings are necessary. Its definitely a worthy read.

http://lwn.net/Articles/250967/

Since this part of the guide is fairly detailed with alot of information, there's a cliff notes version in the next post that gives a brief description of the individual timings.

Memory Timings

Memory timings are one of the most crucial elements in overclocking ram. Its true, for the most part you should aim for the highest speed you can get, but sometimes sacrificing a little speed in order to drop timings, even by 1 clock, can make a world of difference. I'll start with a basic description of timings for those that are new to memory timings.

The Basics

First, i'll define exactly what is meant by 'clocks' in reference to timing. Your ram cells, like your CPU, operate at a frequency. With memory, especially now with DDR2, there are lots of frequency timings that are thrown around, the least of which is the memory chips speed, because this value is the lowest. The reason DDR came into play to supplement the aging SDRAM was because manufacturer could not longer continue pushing memory frequencies without ramping up the voltage to obscene amounts. The last of the SDRAM chips were running at frequencies around 166-200mhz, and some upwards of 250mhz. And to this day, ram chips have not come much further. DDR-800 as chip speeds of 200mhz. DDR1066 has chip speeds of 266mhz. The reason why this is important, is because it is this speed that is used to determine the time that some number of clocks take. For example, 200mhz is 5ns per clock. So a CAS 3(3 clocks), at 200mhz is 15ns. Don't worry about this too much if your just getting started, but for some of the more seasoned clockers looking to push every last little bit out of their ram, calculating the exact time of each clock, and figuring out in nanoseconds what speed each latency is operating at can help you find precisely where your ram limits are, and perhaps allow you to drop the speed a touch, in order to drop to lower the timings to obtain even better results. In a way, its kind of like CPU Multi and FSB, changing timing/clocks makes drastic changes in the timing, while changing chip speeds makes more minor adjustments, while also impacting the bus. I'll explain this in a bit more detail later.

The most common reference to memory timings tends to be in the form 4-4-4-12 2T, so to start, I'll describe which timings are being referred to, and what they mean. In your bios, these timings will be listed as CL(CAS Latency), tRP (Row Precharge), tRCD (RAS to CAS Delay), tRAS(RAS Latency, and lastly, CR(Command Rate).

CAS and RAS signals are the most frequent signals issued within ram. They stand for Column Address Select and Row Address Select. Memory cells are organized in a matrix of rows and columns, and these signals are used to select a point within the matrix to begin an operation, whether its reading or writing. Along with an address, the memory controller sends the quantity of data it needs to receive from that point. Once the memory modules have recieved the address, and deciphered it into a row/column, it proceeds with selecting that row and column, and either transfers the data to the MC, or writes data from the MC to that location.

Because memory row/column selection is not instantaneous, we must allow the memory some time after signaling the RAS and CAS signals, to make sure that the selection has finished. This is where the timings come into play.

First, the memory module lowers the RAS signal in order to select the row it wishes to access. The time it must wait before selecting the column is the tRCD. Once tRCD has passed, it selects the column. After CL has passed, the read/write operation can be performed. If the next request utilizes the same row, then that request is only subject to a new CAS signal and CL before it can be processed.

When a request is made that requires a different row, the memory module must first deselect the row it is using, and select the new row. There is another latency here, this is the tRP. The one good thing about this latency, is it happens in parallel with the read/write that is being performed by the previous operation. If say, the last operation was a read that takes 4 clocks, and the tRP is set to 4 clocks, then it will totally overlap, and this latency will not be noticed. For the most part, this happens and tRP may only add 1 or 2 clocks of latency when selecting a new row if the last operation was for a small amount of data. Not that it is any less important for stability! Once tRP has passed, and the last operation has completed, the row/column selection begins again, and the cycle continues.

There is another added latency that memory has, and it is the tRAS. Like tRP, this latency overlaps other operations. With memory, there is an amount of time that must pass between selecting a row and deselecting that row, that is, before we can enter into tRP, tRAS must have passed.

Code:

+-+-+-+-+
|0|1|2|3|
|4|5|6|7|
+-+-+-+-+
Figure 1 - Memory cell layout

To put this all into perspective, I'll give a bit of an example with some real numbers as to how this all plays out, we'll use timings of 4-4-4-12 (CL-tRCD-tRP-tRAS).

The most basic part, is selecting a row and column, for this, it requires tRCD + CL, or 8 clocks. In order to perform a new operation on the same row, we must way for the operation to complete, and then wait for CL or 4 clocks before the next operation con occur. If the next request requires access to a different row, then it must deselect the previous row. Before it can do that, it must wait at least tRAS or 12 clocks since it selected the row its on. If the last row was only used for 1 operation (tRCD+CL) then only 8 clocks have passed, and it must wait for tRAS before doing this. If there was more than one operation on the same row, (tRCD + CL + transfer time + CL) then the time that has passed would have been 12 clocks + the transfer time. So the only way that tRAS in this example has any affect is if there was only a single operation, and for anything more, tRAS would have already passed by. Now, before the new row can be selected, it must wait at least tRP after deselecting the current row and selecting the new row, which overlaps with the memory operation in progress, and may not be witnessed at all if the memory operation takes longer than tRP cycles to complete.

In reality, a program that is optimized for memory efficiency is not going to incur the cost of tRAS and tRP very often, and is more likely to be limited only by CL and tRCD. This is why these two timings alone are the most important for say, getting better benchmark times. A good benchmarking application will know about tRAS and tRP and do its best not to incur there cost, which is seen more by an application that might randomly select data, and be changing rows alot. For the most part, tRAS and tRP should not be tuned too low, as their performance impact is minimal but like any other timing, they can cause stability issues if set too low.

There is also one last timing that you may see people talk about, and it is the command rate. The CR specifies how many clocks cycles are necessary between selecting a chip, and the being able to access that. For the most part, this is not going to have a large impact on performance, in a 1GB memory module with 8 chips, that's 128MB per chip. In a worst case scenario, the memory being accessed by an application just happens to span two chips(not really within your control), and the command rate might surface into the latency if it is set to 2T. But in reality, you might be talking about a few extra ns every couple or few seconds, and it would be a rare case indeed where the CR added even as much as 0.001% to the amount of latency. For this reason alone, i've been keeping my CR at 2T, as it simply not worth it as even if it does have some impact, its a very small amount, and if it causes stability issues, especially at high frequencies, then its just not worth it.

Of course, this is not all of the timings, there are many many more, including things like read to read, read to write, write to read and write to write latencies. There is also the refresh latency.

I may add something about read/write latencies later, but i will discuss refresh timings a bit.

Because of the design of a memory cell, which utilizes a capacitor to store its value, memory must be refreshed periodically. When a cell is read from, the capicitor loses some of its charge. There is also the effect of 'leakage' where the capictor simply loses some of its charge over time. Because of this, the memory modules have a refresh cycle timing that is measured in microseconds. This time is generally about 7.8us, which is a rate of around 128 KHz. This operation takes a certain amount of cycles to complete, and is associated with the tRFC timing. This timing is usually quite large, in the 25-40 clock range, and is the amount of time given once the refresh cycle has begun, before any new operations can be performed. Some bioses also list the tRFC as an actual time, 70-300ns. While tRFC will directly impact performance, setting this value too low may not give the ram enough time to complete the refresh. The amount of time required for the refresh is proportional to the amount of ram you have. The lower the density, the lower the refresh time required. This is one of the reasons why purpose-built benching rigs tend to want 2x512mb or 2x1gb. In a system with 4+gb setting the tRFC to too low of a value will almost certainly cause stability issues.

The Advanced Stuff

For those that really want to dig into memory timings, if 'The Basics' wasn't deep enough for you, you can start to factor in the actual speed your running at, and do the math on exactly how much time each clock cycle is, and from there calculate the exact time of each latency. i'll start with a speed of 200mhz and timings of 4-4-4-12.

First we'll need to know the time of each cycle, for 200mhz this is 5ns (1000/200). Some of the fastest speed timings that people have achieved are cycles of around 300-400mhz, at 400mhz, (DDR2-1600), we have a clock time of 2.5ns.

In the same way that fsb x CPU multi is used to calculate the CPU speed, cycle time x # of clocks is used to calculate latency times. The other important part is that cycle time also directly affects the time it takes to transfer data from ram. For this reason it is important to know what impacts a particular application the most, data transfer speed, or latency. In order to tune for the best latency, it requires more than simply setting everything to 3-3-3-9 and simply trying to get the best speed possible with those timings. Because your speed would be so low, you might be able to get better performance with higher clocks.

Here's what i like to do. Set your timings to be relatively loose, say 5-5-5-15, and try to get the best speed possible at these timings. There are other factors involved that may limit the speed, but once that speed is found, you can then calculate the cycle time for that speed. Lets say we managed 333mhz, which is around 3ns. We then have latencies of 15-15-15-45 ns. If we were to drop to 4-4-4-12, that would be 12-12-12-36 ns, which for our example, is unstable. But, we would be stable if we could hit 13-13-13-39ns, which would mean we would need a cycle time of 3.25ns, which is 307mhz. In this way, we have tightened our timings by 15% and only impacted our transfer speeds by 8%. If your using a simple memory bandwidth benchmark, then this change may show up as slightly worse, around 8%, but may lead to a slight increase in more realistic benchmarks that are impacted more by latency.

The trick to making this work is finding out, just where your memory loses stability. This can be very challenging because there are so many other factors involved, but suffice it to say, if you concentrate on CL and RCD, and find the best MHz/latency combination that gets these two latencies the lowest, then the rest just needs to be set as best you can get, because for the most part, their impact on latency is quite minimal. In order to more precisely determine at what points CL and RCD reach their stability point, it might help to leave one high, and try to tweak the other. Start with CL, leave RCD at something where it wont cause problems(6-7), along with any other timings, and try to find the lowest latency time(in nanoseconds not just cycles), that you can stability push the CL too. Then do the same with RCD, and this should give you a better idea of how _your_ memory performs. Remember, not all memory is equal, not even the same brand, same IC, heck, same weak/year and serial number are likely to perform different. Its a long slow process, but once you know where the limits are, then its easier to make judgment calls when trying to tune for better latency or better bandwidth.

Another thing, TAKE NOTES. I have about 12 pages worth of notes on clock timings/ speed and voltages from when i was tweaking, and learning what my memory was capable of. And I'm damn glad i did.
All in total i think for my CellShock DDR-1066 set, i have over 300 results of different timing/speed/voltage, and it has helped me immensely. Mostly because there are just way to many combinations of to try and remember, and having those notes to look back on makes things so much easier. And I'm guessing I've only been through tweaking about 20% of what i would like to test. I guess it all depends on your level of obsession

The biggest thing to remember from this, is that its not simply the clock latencies that you choose that determine overall latency. You also have to factor in the cycle time to know exactly what your latency times are, as clock cycles is a relevant timing dependent on cycle time. It may seem pretty obvious when stated like this, but i think often it gets overlooked when people try to 'tighten' timings by lowering the clock cycle with no regard as to what their speed is at.

The other part of the equation that i like to tune is the voltage. Many times i have seen others, and myself included, simply pump a bunch of voltage to the ram with little regard for using 'too much'. This might sound a bit off, but really, if you dont need 2.5v to be stable at some given setting, but instead 2.24v would suffice, then lower it. The thing about voltage, as you well may know, is that it increases heat.

Now, this part is speculation, i could well be wrong on this, I'll try to dig up some proof if i can find some. But I'm thinking also that increased voltage also can cause excessive leakage. If the leakage gets out of control, and the refresh timing isnt enough to keep up with the excess voltage, then this in itself could well cause stability issues that would not be seen at lower voltages, so long as you could still be stable at those voltages.

One way to counteract this, if it were true, would be to increase the refresh time. My setting allows 3.9/7.8/15.6 us timings, giving 256/128/64Khz refresh rates. Now say your tRFC was at 100ns, that would mean at a 7.8us refresh rate(128Khz) you would have about 12.8ms of refresh time per second, or 1.28% of your rams time would be spent refreshing. Now, lowering the tRFC would lower this value some, but it is not going to increase the amount of refreshing done. Its only giving the refresh more time to complete. If you were to increase the refresh rate to 3.9us, this would of course bring the refresh time up to 2.56%, which, depending on if you can get more frequency with that refresh rate, might outweigh the extra 1.28% performance hit in increased refresh rates. Of course, like any other latency, increased frequency, and lower subsequent tRFC values also lowers the % of time spent refreshing. Its really a last resort to try and tweak every last bit out of your ram, and might make the 5mhz you need. Its a long shot, but it shouldn't be overlooked.

The Obscure Stuff

I discussed a bit above in the advanced part about trying to determine the actual time your chips require to perform operations like row and column selection. Determining this can be alot of trial and error. But i thought i would discuss a bit about why these values are what they are. It may give a bit of insight for the initiated overclocker to want to find new ways of overcoming the latency walls of their current setup.

The act of accessing memory takes some time. Electrical signals have a speed at which they can move, with the absolute limit being the speed of light. This might seem like an insanely fast speed, but when your talking in terms of nanoseconds, even at the speed of light you cant go very far in a single ns. Before the row/column selection can begin, a memory address sent from the cpu must be deciphered into a chip, and then a row/column selection on that chip. In memory speak, this is called demultiplexing. Once the module knows which row/column to select, it can begin with the RAS, wait for RCD, then CAS and wait for CL. The reason for this wait between signals, is because it takes time for the signal to travel from the multiplexer to the chip and active the precise row/column. This kind of latency is called Wire RC Delay. Wire RC Delay accounts for the largest amount of the latency between sending a RAS/CAS signal, and when its actually ready for use. The topic of RC Delay is really quite advanced, and is dependent on the manufacturing process. Suffice to say that in the largest part, its what makes the difference between cheap ram, that can only slightly tighter than stock settings, and more expensive ram that can run much tighter and faster, without a loss of stability. The well known D9 Micron chips are some of the best chips because of their low RC Delay. Another part of the latency equation is called Bit-Line sensing. Because the charge held in the capacitor of a memory cell is so small (some 10k electrons), the signal must be amplified in order to make sense of whether it is a 0 or a 1. Then there is also the time required to send the data from the sense amplifier to the bus for transport to the cpu. In total, sending the RAS/CAS signal and processing the charge through the sense amplifier makes up approximately 75% of the latency. And the majority of this latency is from RC Delay.

Now you might be wondering, that's all well and great if your a ram manufacturer, but most of this is out of the hands of simple settings. And your right for the most part. Unless of course your willing to dive into the effects of RC (which stands for resistance/capacitance) on wire. Its really somewhat of a grey area as to what can be done to lower the RC Delay. Attempts have been made to ram to try and cool it to sub-zero to try and obtain the same effects that have been seen with cpus at sub-zero temps. But for the most part it really just wasn't worth it. But, that's not to see that you shouldn't at least keep your ram as cool as possible, pushing high frequencies, tight timings and high voltage, all contribute to the activity of the wires, and thus increase their heat, and heat adds resistance. Too bad we cant make ram from superconductive material and eliminate RC delay altogether

There's a couple of more things I'll add to this section once i find more information on them. Specifically the more obscure ram timings (Read2Read, Read2Write and such), also some stuff about drive strengths, which I'm actively researching.

Cluster · Dec 16, 2008

I'll give a little quick reference here, since the above is quite long, of the main timings and their importance.

CAS Latency(CL)
Normally the most important timing, as this latency generally makes up the majority of all latency. This is the time between column selection (CAS) and when the ram can begin transfering data to the data pins.

RAS to CAS Delay (tRCD)
Also a very important timing, this is the delay between when a row is selected, and when the column can be selected. Unlike CAS, not every operation requires a new row to be selected, if the row is already activated, then RAS does not need to be sent again.

Row Precharge (tRP)
When the incoming memory operation requires a new row to be selected, it must first deselect the currently activated row. Before the new RAS signal can be sent, it must wait for tRP clocks. The good thing about this latency, is it overlaps the operation in progress. After the CL has passed, the previous memory operation begins, and so does the tRP count. If the memory operation takes longer than tRP, than this latency has no effect. This value should be set to the same as tRCD.

RAS Latency (tRAS)
This is the minimum number of clocks that must pass after a row is selected, before it can be deselected. Basically, this is the minimum time that a row can be selected for. This overlaps with CL and tRCD, and should be set to a value 2-4 clocks more than CL + tRCD. This latency, like tRP, does not have its full impact on the total latency, and like tRP, does not normally have any effect at all on latency.

Row Cycle Time(tRC)
Along with tRAS and tRP there is another timing that limits how fast rows can be activated and deactivated. The tRC sets a minimum time between when a row is activated, and when the next row can be activated. Kind of like tRAS, which limits between row activation and row deactivation, the minimum this setting can be is tRAS + tRP. Setting any lower is pointless, as tRAS + tRP is the minimum anyway. For the most part, like the other two timings, this latency is negligible and does not have much of an affect if any on actual latency.

Refresh Timing
Your bios may or may not actual make this setting available. For all intents and purposes it should be left as is. This is the timing that controls the refresh rate of the ram. The refresh rate is measured in micro-seconds, and generally has values like 3.9/7.8/15.6 for 64KHz/128KHz/256KHz refresh rates. From what i can find, 128KHz tends to be the norm, and 64KHz can cause stability issues as the ram cells lose their charge over time, and 64KHz isnt a fast enough refresh rate, slowly allowing memory values that should have been 1 to become a 0. 256KHz, while impacting performance by about 1%, may help stability with high voltages and higher clocks, though it may not help at all either.

Refresh Cycle Time (tRFC)
This is generally a large number of clocks, in the 25-45 range. Some bioses will give this value in ns and from what i can tell, its not an absolute ns timing, but simply a clock * clock time with a default clock time of 200mhz. Regardless, this timing is the amount of time after a refresh has been triggered, until the ram becomes usable again. The effects of lowering this will directly impact performance, but, since the refresh rate is not that frequent relative to memory speeds, its affects are pretty small. In general this value is in the range of 100-200ns, or 30-40 clocks. The biggest thing that affects where this setting will be stable is dependant on your ram density. More ram needs more time to refresh.
Basically, any application that is even the least bit optimized for memory access should almost never incur the latencies of tRP and tRAS. Applications that go out of their way to optimize for memory access, will be more heavily dependent on CL over tRAS, and the really good memory applications are going to be very cpu cache oriented, and in this case, bandwidth is much more important than any latency.

Neuromancer · Dec 16, 2008

Great start. The "leakage" you mentioned from the overvoltage is called electromigration if you are interested.

Cluster · Dec 16, 2008

Ok, so my theory on refreshing helping overvolting isnt totally off the wall, will see what i can dig up on that and fit it in.

Priv · Dec 16, 2008

Really good guide! Could you maybe work performance level in to the guide too? I actually understand how ram works a bit now, not fully yet, but I doubt I will ever understand ram to it's full capabilities.

Cluster · Dec 16, 2008

Performance Level? Did a quick google on this, but im afraid its a bit of an ambiguous term.

Is this some software level adjustment? or perhaps something Intel based? Im really only familiar with AMD systems, havent used an intel chip since Pentium MMX. Which is also why i tried to arrange things to be vendor neutral.

18 is # 1 · Dec 16, 2008

Cluster said:
Performance Level? Did a quick google on this, but im afraid its a bit of an ambiguous term.

Is this some software level adjustment? or perhaps something Intel based? Im really only familiar with AMD systems, havent used an intel chip since Pentium MMX. Which is also why i tried to arrange things to be vendor neutral.

It is a bios setting that deals with the MCH (Memory Control Hub). On some boards, the bios won't allow this level to be adjusted (on my Abit Pro the MCHBAR is locked). Instead, PL (performance level) or tRD is set by the spec of the RAM (667,800,1066,etc...) and the RAM/FSB divider used.
i.e. Running 667 at 4:5 1066 give PL5
Running 800 at 4:5 1066 gives PL6
Running 1066 at 4:5 1066 gives PL7
Memset is a great program to allow you to see subtimings from Windows and on many mobos, change timings from Windows: http://www.tweakers.fr/memset.html

EarthDog · Dec 16, 2008

Yes, not mentioning PL in a memory timing guide is a little..odd. This PL level tremendously helps stability on the ram and tweaks to it can improve benchmarks.

Good guide none the less.

Cluster · Dec 16, 2008

So this PL thing is an Intel thing then? I was hoping to keep it vendor neutral. Timings are specific to the module itself. Although there were some things im planning to add into subsections, they're not ram clock timing specific, one example is clock skew, i have a good bit of information on it. If someone would care to explain what PL is, how it works, and how to tune it, i'll add it into a subsection. AMD Doesnt have an MCH, and neither does i7 i think, instead there are IMCs, and those a other bag of worms.

I'm also working to put together a software list in a subsection, memset for intel, cpu-tweaker for amd were on the short list.

Again, if i miss anything important for Intel, i have little knowledge of Intel's ways, hence my point of making the timings part of the guide vendor neutral, but i'll gladly add a subsection on some of the settings from AMD/Intel specific things.

18 is # 1 · Dec 17, 2008

Cluster said:
AMD Doesnt have an MCH, and neither does i7 i think, instead there are IMCs, and those a other bag of worms.

I think AMD and the new Intel chipsets have the MCH onboard.

EarthDog · Dec 17, 2008

Cluster said:
So this PL thing is an Intel thing then? I was hoping to keep it vendor neutral. Timings are specific to the module itself. Although there were some things im planning to add into subsections, they're not ram clock timing specific, one example is clock skew, i have a good bit of information on it. If someone would care to explain what PL is, how it works, and how to tune it, i'll add it into a subsection. AMD Doesnt have an MCH, and neither does i7 i think, instead there are IMCs, and those a other bag of worms.

I'm also working to put together a software list in a subsection, memset for intel, cpu-tweaker for amd were on the short list.

Again, if i miss anything important for Intel, i have little knowledge of Intel's ways, hence my point of making the timings part of the guide vendor neutral, but i'll gladly add a subsection on some of the settings from AMD/Intel specific things.

If you are keeping it neutral and just for timings' sake, then its probably best left out as I believe that is only an Intel thing....

Isnt the MCH the northbridge? That is still seperate. Its the Memorry controller that is now integrated in Intel land.

Cluster · Dec 20, 2008

Bump. Added cliff notes in second post.

Priv · Dec 20, 2008

Are you going to work out the impact of subtimings on performance? Like read to write delay? Your post said you will, just curious. I for one have yet to see performance improvement when lowering these timings, but some insight of an expert would be nice.

Cluster · Dec 20, 2008

I do plan to, but finding some actual factual information on them, aside from most of the vague descriptions of them floating around, is few and far between. Im in the processes of shooting off some emails to see if i can get in touch with someone that actually knows where these timings fit into the memory operations cycle.

I have a feeling, that like tRAS/tRP/tRC, they're is some overlapping of these timings with other operations. Just piecing together what i can, read/write are controlled by a WE(Write Enable) activation signal, like RAS/CAS, and these read/write latencies are used to control the time necessary to bring this signal upto full activation, and to let it deactivate. Once i find something out for sure, i'll put it up

I've also got some pictures im brewing up too to put a few breaks into the sea of text in that first post

Froggy · Dec 21, 2008

There is a guy named OnePageBook that is extremely knowledgeable about RAM IIRC.

Here is his profile page http://www.ocforums.com/member.php?u=56560

He is more active on XS from what I understand.

Cluster · Dec 21, 2008

Ah, good call.

sbinh · Dec 22, 2008

nice post

Thanks dude

Ram Timings Guide

Cluster

Member

Cluster

Member

Neuromancer

Member

Cluster

Member

Priv

Member

Cluster

Member

18 is # 1

Member

EarthDog

Gulper Nozzle Co-Owner

Cluster

Member

18 is # 1

Member

EarthDog

Gulper Nozzle Co-Owner

Cluster

Member

Priv

Member

Cluster

Member

Froggy

Member

Cluster

Member

sbinh

Member

Similar threads