- Joined
- Dec 5, 2001
- Location
- Canuckistan
Ram Timing Guide (Last Updated: 12/20/08)
I've spent the last couple of weeks messing around with my ram, did a lot of reading, and learned a lot of interesting things while i was at it. I remember seeing someone say there wasn't a decent memory guide on the boards, so i decided to write this in the hopes that maybe it would help somebody. What I'm detailing here is a brief description of memory overclocking, bios options, and the procedure i go through to learn what my ram is capable of. I'll start first with a simple rundown of the timings, without getting into too excessive of technical detail, maybe in another guide.
If you wish to learn more about how memory actually works, one of the best articles on this subject that is easy to understand, although it was written for programmers, gives a really good description of how modern day ram operates, and why ram timings are necessary. Its definitely a worthy read.
http://lwn.net/Articles/250967/
Since this part of the guide is fairly detailed with alot of information, there's a cliff notes version in the next post that gives a brief description of the individual timings.
Memory Timings
Memory timings are one of the most crucial elements in overclocking ram. Its true, for the most part you should aim for the highest speed you can get, but sometimes sacrificing a little speed in order to drop timings, even by 1 clock, can make a world of difference. I'll start with a basic description of timings for those that are new to memory timings.
The Basics
First, i'll define exactly what is meant by 'clocks' in reference to timing. Your ram cells, like your CPU, operate at a frequency. With memory, especially now with DDR2, there are lots of frequency timings that are thrown around, the least of which is the memory chips speed, because this value is the lowest. The reason DDR came into play to supplement the aging SDRAM was because manufacturer could not longer continue pushing memory frequencies without ramping up the voltage to obscene amounts. The last of the SDRAM chips were running at frequencies around 166-200mhz, and some upwards of 250mhz. And to this day, ram chips have not come much further. DDR-800 as chip speeds of 200mhz. DDR1066 has chip speeds of 266mhz. The reason why this is important, is because it is this speed that is used to determine the time that some number of clocks take. For example, 200mhz is 5ns per clock. So a CAS 3(3 clocks), at 200mhz is 15ns. Don't worry about this too much if your just getting started, but for some of the more seasoned clockers looking to push every last little bit out of their ram, calculating the exact time of each clock, and figuring out in nanoseconds what speed each latency is operating at can help you find precisely where your ram limits are, and perhaps allow you to drop the speed a touch, in order to drop to lower the timings to obtain even better results. In a way, its kind of like CPU Multi and FSB, changing timing/clocks makes drastic changes in the timing, while changing chip speeds makes more minor adjustments, while also impacting the bus. I'll explain this in a bit more detail later.
The most common reference to memory timings tends to be in the form 4-4-4-12 2T, so to start, I'll describe which timings are being referred to, and what they mean. In your bios, these timings will be listed as CL(CAS Latency), tRP (Row Precharge), tRCD (RAS to CAS Delay), tRAS(RAS Latency, and lastly, CR(Command Rate).
CAS and RAS signals are the most frequent signals issued within ram. They stand for Column Address Select and Row Address Select. Memory cells are organized in a matrix of rows and columns, and these signals are used to select a point within the matrix to begin an operation, whether its reading or writing. Along with an address, the memory controller sends the quantity of data it needs to receive from that point. Once the memory modules have recieved the address, and deciphered it into a row/column, it proceeds with selecting that row and column, and either transfers the data to the MC, or writes data from the MC to that location.
Because memory row/column selection is not instantaneous, we must allow the memory some time after signaling the RAS and CAS signals, to make sure that the selection has finished. This is where the timings come into play.
First, the memory module lowers the RAS signal in order to select the row it wishes to access. The time it must wait before selecting the column is the tRCD. Once tRCD has passed, it selects the column. After CL has passed, the read/write operation can be performed. If the next request utilizes the same row, then that request is only subject to a new CAS signal and CL before it can be processed.
When a request is made that requires a different row, the memory module must first deselect the row it is using, and select the new row. There is another latency here, this is the tRP. The one good thing about this latency, is it happens in parallel with the read/write that is being performed by the previous operation. If say, the last operation was a read that takes 4 clocks, and the tRP is set to 4 clocks, then it will totally overlap, and this latency will not be noticed. For the most part, this happens and tRP may only add 1 or 2 clocks of latency when selecting a new row if the last operation was for a small amount of data. Not that it is any less important for stability! Once tRP has passed, and the last operation has completed, the row/column selection begins again, and the cycle continues.
There is another added latency that memory has, and it is the tRAS. Like tRP, this latency overlaps other operations. With memory, there is an amount of time that must pass between selecting a row and deselecting that row, that is, before we can enter into tRP, tRAS must have passed.
To put this all into perspective, I'll give a bit of an example with some real numbers as to how this all plays out, we'll use timings of 4-4-4-12 (CL-tRCD-tRP-tRAS).
The most basic part, is selecting a row and column, for this, it requires tRCD + CL, or 8 clocks. In order to perform a new operation on the same row, we must way for the operation to complete, and then wait for CL or 4 clocks before the next operation con occur. If the next request requires access to a different row, then it must deselect the previous row. Before it can do that, it must wait at least tRAS or 12 clocks since it selected the row its on. If the last row was only used for 1 operation (tRCD+CL) then only 8 clocks have passed, and it must wait for tRAS before doing this. If there was more than one operation on the same row, (tRCD + CL + transfer time + CL) then the time that has passed would have been 12 clocks + the transfer time. So the only way that tRAS in this example has any affect is if there was only a single operation, and for anything more, tRAS would have already passed by. Now, before the new row can be selected, it must wait at least tRP after deselecting the current row and selecting the new row, which overlaps with the memory operation in progress, and may not be witnessed at all if the memory operation takes longer than tRP cycles to complete.
In reality, a program that is optimized for memory efficiency is not going to incur the cost of tRAS and tRP very often, and is more likely to be limited only by CL and tRCD. This is why these two timings alone are the most important for say, getting better benchmark times. A good benchmarking application will know about tRAS and tRP and do its best not to incur there cost, which is seen more by an application that might randomly select data, and be changing rows alot. For the most part, tRAS and tRP should not be tuned too low, as their performance impact is minimal but like any other timing, they can cause stability issues if set too low.
There is also one last timing that you may see people talk about, and it is the command rate. The CR specifies how many clocks cycles are necessary between selecting a chip, and the being able to access that. For the most part, this is not going to have a large impact on performance, in a 1GB memory module with 8 chips, that's 128MB per chip. In a worst case scenario, the memory being accessed by an application just happens to span two chips(not really within your control), and the command rate might surface into the latency if it is set to 2T. But in reality, you might be talking about a few extra ns every couple or few seconds, and it would be a rare case indeed where the CR added even as much as 0.001% to the amount of latency. For this reason alone, i've been keeping my CR at 2T, as it simply not worth it as even if it does have some impact, its a very small amount, and if it causes stability issues, especially at high frequencies, then its just not worth it.
Of course, this is not all of the timings, there are many many more, including things like read to read, read to write, write to read and write to write latencies. There is also the refresh latency.
I may add something about read/write latencies later, but i will discuss refresh timings a bit.
Because of the design of a memory cell, which utilizes a capacitor to store its value, memory must be refreshed periodically. When a cell is read from, the capicitor loses some of its charge. There is also the effect of 'leakage' where the capictor simply loses some of its charge over time. Because of this, the memory modules have a refresh cycle timing that is measured in microseconds. This time is generally about 7.8us, which is a rate of around 128 KHz. This operation takes a certain amount of cycles to complete, and is associated with the tRFC timing. This timing is usually quite large, in the 25-40 clock range, and is the amount of time given once the refresh cycle has begun, before any new operations can be performed. Some bioses also list the tRFC as an actual time, 70-300ns. While tRFC will directly impact performance, setting this value too low may not give the ram enough time to complete the refresh. The amount of time required for the refresh is proportional to the amount of ram you have. The lower the density, the lower the refresh time required. This is one of the reasons why purpose-built benching rigs tend to want 2x512mb or 2x1gb. In a system with 4+gb setting the tRFC to too low of a value will almost certainly cause stability issues.
The Advanced Stuff
For those that really want to dig into memory timings, if 'The Basics' wasn't deep enough for you, you can start to factor in the actual speed your running at, and do the math on exactly how much time each clock cycle is, and from there calculate the exact time of each latency. i'll start with a speed of 200mhz and timings of 4-4-4-12.
First we'll need to know the time of each cycle, for 200mhz this is 5ns (1000/200). Some of the fastest speed timings that people have achieved are cycles of around 300-400mhz, at 400mhz, (DDR2-1600), we have a clock time of 2.5ns.
In the same way that fsb x CPU multi is used to calculate the CPU speed, cycle time x # of clocks is used to calculate latency times. The other important part is that cycle time also directly affects the time it takes to transfer data from ram. For this reason it is important to know what impacts a particular application the most, data transfer speed, or latency. In order to tune for the best latency, it requires more than simply setting everything to 3-3-3-9 and simply trying to get the best speed possible with those timings. Because your speed would be so low, you might be able to get better performance with higher clocks.
Here's what i like to do. Set your timings to be relatively loose, say 5-5-5-15, and try to get the best speed possible at these timings. There are other factors involved that may limit the speed, but once that speed is found, you can then calculate the cycle time for that speed. Lets say we managed 333mhz, which is around 3ns. We then have latencies of 15-15-15-45 ns. If we were to drop to 4-4-4-12, that would be 12-12-12-36 ns, which for our example, is unstable. But, we would be stable if we could hit 13-13-13-39ns, which would mean we would need a cycle time of 3.25ns, which is 307mhz. In this way, we have tightened our timings by 15% and only impacted our transfer speeds by 8%. If your using a simple memory bandwidth benchmark, then this change may show up as slightly worse, around 8%, but may lead to a slight increase in more realistic benchmarks that are impacted more by latency.
The trick to making this work is finding out, just where your memory loses stability. This can be very challenging because there are so many other factors involved, but suffice it to say, if you concentrate on CL and RCD, and find the best MHz/latency combination that gets these two latencies the lowest, then the rest just needs to be set as best you can get, because for the most part, their impact on latency is quite minimal. In order to more precisely determine at what points CL and RCD reach their stability point, it might help to leave one high, and try to tweak the other. Start with CL, leave RCD at something where it wont cause problems(6-7), along with any other timings, and try to find the lowest latency time(in nanoseconds not just cycles), that you can stability push the CL too. Then do the same with RCD, and this should give you a better idea of how _your_ memory performs. Remember, not all memory is equal, not even the same brand, same IC, heck, same weak/year and serial number are likely to perform different. Its a long slow process, but once you know where the limits are, then its easier to make judgment calls when trying to tune for better latency or better bandwidth.
Another thing, TAKE NOTES. I have about 12 pages worth of notes on clock timings/ speed and voltages from when i was tweaking, and learning what my memory was capable of. And I'm damn glad i did.
All in total i think for my CellShock DDR-1066 set, i have over 300 results of different timing/speed/voltage, and it has helped me immensely. Mostly because there are just way to many combinations of to try and remember, and having those notes to look back on makes things so much easier. And I'm guessing I've only been through tweaking about 20% of what i would like to test. I guess it all depends on your level of obsession
The biggest thing to remember from this, is that its not simply the clock latencies that you choose that determine overall latency. You also have to factor in the cycle time to know exactly what your latency times are, as clock cycles is a relevant timing dependent on cycle time. It may seem pretty obvious when stated like this, but i think often it gets overlooked when people try to 'tighten' timings by lowering the clock cycle with no regard as to what their speed is at.
The other part of the equation that i like to tune is the voltage. Many times i have seen others, and myself included, simply pump a bunch of voltage to the ram with little regard for using 'too much'. This might sound a bit off, but really, if you dont need 2.5v to be stable at some given setting, but instead 2.24v would suffice, then lower it. The thing about voltage, as you well may know, is that it increases heat.
Now, this part is speculation, i could well be wrong on this, I'll try to dig up some proof if i can find some. But I'm thinking also that increased voltage also can cause excessive leakage. If the leakage gets out of control, and the refresh timing isnt enough to keep up with the excess voltage, then this in itself could well cause stability issues that would not be seen at lower voltages, so long as you could still be stable at those voltages.
One way to counteract this, if it were true, would be to increase the refresh time. My setting allows 3.9/7.8/15.6 us timings, giving 256/128/64Khz refresh rates. Now say your tRFC was at 100ns, that would mean at a 7.8us refresh rate(128Khz) you would have about 12.8ms of refresh time per second, or 1.28% of your rams time would be spent refreshing. Now, lowering the tRFC would lower this value some, but it is not going to increase the amount of refreshing done. Its only giving the refresh more time to complete. If you were to increase the refresh rate to 3.9us, this would of course bring the refresh time up to 2.56%, which, depending on if you can get more frequency with that refresh rate, might outweigh the extra 1.28% performance hit in increased refresh rates. Of course, like any other latency, increased frequency, and lower subsequent tRFC values also lowers the % of time spent refreshing. Its really a last resort to try and tweak every last bit out of your ram, and might make the 5mhz you need. Its a long shot, but it shouldn't be overlooked.
The Obscure Stuff
I discussed a bit above in the advanced part about trying to determine the actual time your chips require to perform operations like row and column selection. Determining this can be alot of trial and error. But i thought i would discuss a bit about why these values are what they are. It may give a bit of insight for the initiated overclocker to want to find new ways of overcoming the latency walls of their current setup.
The act of accessing memory takes some time. Electrical signals have a speed at which they can move, with the absolute limit being the speed of light. This might seem like an insanely fast speed, but when your talking in terms of nanoseconds, even at the speed of light you cant go very far in a single ns. Before the row/column selection can begin, a memory address sent from the cpu must be deciphered into a chip, and then a row/column selection on that chip. In memory speak, this is called demultiplexing. Once the module knows which row/column to select, it can begin with the RAS, wait for RCD, then CAS and wait for CL. The reason for this wait between signals, is because it takes time for the signal to travel from the multiplexer to the chip and active the precise row/column. This kind of latency is called Wire RC Delay. Wire RC Delay accounts for the largest amount of the latency between sending a RAS/CAS signal, and when its actually ready for use. The topic of RC Delay is really quite advanced, and is dependent on the manufacturing process. Suffice to say that in the largest part, its what makes the difference between cheap ram, that can only slightly tighter than stock settings, and more expensive ram that can run much tighter and faster, without a loss of stability. The well known D9 Micron chips are some of the best chips because of their low RC Delay. Another part of the latency equation is called Bit-Line sensing. Because the charge held in the capacitor of a memory cell is so small (some 10k electrons), the signal must be amplified in order to make sense of whether it is a 0 or a 1. Then there is also the time required to send the data from the sense amplifier to the bus for transport to the cpu. In total, sending the RAS/CAS signal and processing the charge through the sense amplifier makes up approximately 75% of the latency. And the majority of this latency is from RC Delay.
Now you might be wondering, that's all well and great if your a ram manufacturer, but most of this is out of the hands of simple settings. And your right for the most part. Unless of course your willing to dive into the effects of RC (which stands for resistance/capacitance) on wire. Its really somewhat of a grey area as to what can be done to lower the RC Delay. Attempts have been made to ram to try and cool it to sub-zero to try and obtain the same effects that have been seen with cpus at sub-zero temps. But for the most part it really just wasn't worth it. But, that's not to see that you shouldn't at least keep your ram as cool as possible, pushing high frequencies, tight timings and high voltage, all contribute to the activity of the wires, and thus increase their heat, and heat adds resistance. Too bad we cant make ram from superconductive material and eliminate RC delay altogether
There's a couple of more things I'll add to this section once i find more information on them. Specifically the more obscure ram timings (Read2Read, Read2Write and such), also some stuff about drive strengths, which I'm actively researching.
I've spent the last couple of weeks messing around with my ram, did a lot of reading, and learned a lot of interesting things while i was at it. I remember seeing someone say there wasn't a decent memory guide on the boards, so i decided to write this in the hopes that maybe it would help somebody. What I'm detailing here is a brief description of memory overclocking, bios options, and the procedure i go through to learn what my ram is capable of. I'll start first with a simple rundown of the timings, without getting into too excessive of technical detail, maybe in another guide.
If you wish to learn more about how memory actually works, one of the best articles on this subject that is easy to understand, although it was written for programmers, gives a really good description of how modern day ram operates, and why ram timings are necessary. Its definitely a worthy read.
http://lwn.net/Articles/250967/
Since this part of the guide is fairly detailed with alot of information, there's a cliff notes version in the next post that gives a brief description of the individual timings.
Memory Timings
Memory timings are one of the most crucial elements in overclocking ram. Its true, for the most part you should aim for the highest speed you can get, but sometimes sacrificing a little speed in order to drop timings, even by 1 clock, can make a world of difference. I'll start with a basic description of timings for those that are new to memory timings.
The Basics
First, i'll define exactly what is meant by 'clocks' in reference to timing. Your ram cells, like your CPU, operate at a frequency. With memory, especially now with DDR2, there are lots of frequency timings that are thrown around, the least of which is the memory chips speed, because this value is the lowest. The reason DDR came into play to supplement the aging SDRAM was because manufacturer could not longer continue pushing memory frequencies without ramping up the voltage to obscene amounts. The last of the SDRAM chips were running at frequencies around 166-200mhz, and some upwards of 250mhz. And to this day, ram chips have not come much further. DDR-800 as chip speeds of 200mhz. DDR1066 has chip speeds of 266mhz. The reason why this is important, is because it is this speed that is used to determine the time that some number of clocks take. For example, 200mhz is 5ns per clock. So a CAS 3(3 clocks), at 200mhz is 15ns. Don't worry about this too much if your just getting started, but for some of the more seasoned clockers looking to push every last little bit out of their ram, calculating the exact time of each clock, and figuring out in nanoseconds what speed each latency is operating at can help you find precisely where your ram limits are, and perhaps allow you to drop the speed a touch, in order to drop to lower the timings to obtain even better results. In a way, its kind of like CPU Multi and FSB, changing timing/clocks makes drastic changes in the timing, while changing chip speeds makes more minor adjustments, while also impacting the bus. I'll explain this in a bit more detail later.
The most common reference to memory timings tends to be in the form 4-4-4-12 2T, so to start, I'll describe which timings are being referred to, and what they mean. In your bios, these timings will be listed as CL(CAS Latency), tRP (Row Precharge), tRCD (RAS to CAS Delay), tRAS(RAS Latency, and lastly, CR(Command Rate).
CAS and RAS signals are the most frequent signals issued within ram. They stand for Column Address Select and Row Address Select. Memory cells are organized in a matrix of rows and columns, and these signals are used to select a point within the matrix to begin an operation, whether its reading or writing. Along with an address, the memory controller sends the quantity of data it needs to receive from that point. Once the memory modules have recieved the address, and deciphered it into a row/column, it proceeds with selecting that row and column, and either transfers the data to the MC, or writes data from the MC to that location.
Because memory row/column selection is not instantaneous, we must allow the memory some time after signaling the RAS and CAS signals, to make sure that the selection has finished. This is where the timings come into play.
First, the memory module lowers the RAS signal in order to select the row it wishes to access. The time it must wait before selecting the column is the tRCD. Once tRCD has passed, it selects the column. After CL has passed, the read/write operation can be performed. If the next request utilizes the same row, then that request is only subject to a new CAS signal and CL before it can be processed.
When a request is made that requires a different row, the memory module must first deselect the row it is using, and select the new row. There is another latency here, this is the tRP. The one good thing about this latency, is it happens in parallel with the read/write that is being performed by the previous operation. If say, the last operation was a read that takes 4 clocks, and the tRP is set to 4 clocks, then it will totally overlap, and this latency will not be noticed. For the most part, this happens and tRP may only add 1 or 2 clocks of latency when selecting a new row if the last operation was for a small amount of data. Not that it is any less important for stability! Once tRP has passed, and the last operation has completed, the row/column selection begins again, and the cycle continues.
There is another added latency that memory has, and it is the tRAS. Like tRP, this latency overlaps other operations. With memory, there is an amount of time that must pass between selecting a row and deselecting that row, that is, before we can enter into tRP, tRAS must have passed.
Code:
+-+-+-+-+
|0|1|2|3|
|4|5|6|7|
+-+-+-+-+
Figure 1 - Memory cell layout
To put this all into perspective, I'll give a bit of an example with some real numbers as to how this all plays out, we'll use timings of 4-4-4-12 (CL-tRCD-tRP-tRAS).
The most basic part, is selecting a row and column, for this, it requires tRCD + CL, or 8 clocks. In order to perform a new operation on the same row, we must way for the operation to complete, and then wait for CL or 4 clocks before the next operation con occur. If the next request requires access to a different row, then it must deselect the previous row. Before it can do that, it must wait at least tRAS or 12 clocks since it selected the row its on. If the last row was only used for 1 operation (tRCD+CL) then only 8 clocks have passed, and it must wait for tRAS before doing this. If there was more than one operation on the same row, (tRCD + CL + transfer time + CL) then the time that has passed would have been 12 clocks + the transfer time. So the only way that tRAS in this example has any affect is if there was only a single operation, and for anything more, tRAS would have already passed by. Now, before the new row can be selected, it must wait at least tRP after deselecting the current row and selecting the new row, which overlaps with the memory operation in progress, and may not be witnessed at all if the memory operation takes longer than tRP cycles to complete.
In reality, a program that is optimized for memory efficiency is not going to incur the cost of tRAS and tRP very often, and is more likely to be limited only by CL and tRCD. This is why these two timings alone are the most important for say, getting better benchmark times. A good benchmarking application will know about tRAS and tRP and do its best not to incur there cost, which is seen more by an application that might randomly select data, and be changing rows alot. For the most part, tRAS and tRP should not be tuned too low, as their performance impact is minimal but like any other timing, they can cause stability issues if set too low.
There is also one last timing that you may see people talk about, and it is the command rate. The CR specifies how many clocks cycles are necessary between selecting a chip, and the being able to access that. For the most part, this is not going to have a large impact on performance, in a 1GB memory module with 8 chips, that's 128MB per chip. In a worst case scenario, the memory being accessed by an application just happens to span two chips(not really within your control), and the command rate might surface into the latency if it is set to 2T. But in reality, you might be talking about a few extra ns every couple or few seconds, and it would be a rare case indeed where the CR added even as much as 0.001% to the amount of latency. For this reason alone, i've been keeping my CR at 2T, as it simply not worth it as even if it does have some impact, its a very small amount, and if it causes stability issues, especially at high frequencies, then its just not worth it.
Of course, this is not all of the timings, there are many many more, including things like read to read, read to write, write to read and write to write latencies. There is also the refresh latency.
I may add something about read/write latencies later, but i will discuss refresh timings a bit.
Because of the design of a memory cell, which utilizes a capacitor to store its value, memory must be refreshed periodically. When a cell is read from, the capicitor loses some of its charge. There is also the effect of 'leakage' where the capictor simply loses some of its charge over time. Because of this, the memory modules have a refresh cycle timing that is measured in microseconds. This time is generally about 7.8us, which is a rate of around 128 KHz. This operation takes a certain amount of cycles to complete, and is associated with the tRFC timing. This timing is usually quite large, in the 25-40 clock range, and is the amount of time given once the refresh cycle has begun, before any new operations can be performed. Some bioses also list the tRFC as an actual time, 70-300ns. While tRFC will directly impact performance, setting this value too low may not give the ram enough time to complete the refresh. The amount of time required for the refresh is proportional to the amount of ram you have. The lower the density, the lower the refresh time required. This is one of the reasons why purpose-built benching rigs tend to want 2x512mb or 2x1gb. In a system with 4+gb setting the tRFC to too low of a value will almost certainly cause stability issues.
The Advanced Stuff
For those that really want to dig into memory timings, if 'The Basics' wasn't deep enough for you, you can start to factor in the actual speed your running at, and do the math on exactly how much time each clock cycle is, and from there calculate the exact time of each latency. i'll start with a speed of 200mhz and timings of 4-4-4-12.
First we'll need to know the time of each cycle, for 200mhz this is 5ns (1000/200). Some of the fastest speed timings that people have achieved are cycles of around 300-400mhz, at 400mhz, (DDR2-1600), we have a clock time of 2.5ns.
In the same way that fsb x CPU multi is used to calculate the CPU speed, cycle time x # of clocks is used to calculate latency times. The other important part is that cycle time also directly affects the time it takes to transfer data from ram. For this reason it is important to know what impacts a particular application the most, data transfer speed, or latency. In order to tune for the best latency, it requires more than simply setting everything to 3-3-3-9 and simply trying to get the best speed possible with those timings. Because your speed would be so low, you might be able to get better performance with higher clocks.
Here's what i like to do. Set your timings to be relatively loose, say 5-5-5-15, and try to get the best speed possible at these timings. There are other factors involved that may limit the speed, but once that speed is found, you can then calculate the cycle time for that speed. Lets say we managed 333mhz, which is around 3ns. We then have latencies of 15-15-15-45 ns. If we were to drop to 4-4-4-12, that would be 12-12-12-36 ns, which for our example, is unstable. But, we would be stable if we could hit 13-13-13-39ns, which would mean we would need a cycle time of 3.25ns, which is 307mhz. In this way, we have tightened our timings by 15% and only impacted our transfer speeds by 8%. If your using a simple memory bandwidth benchmark, then this change may show up as slightly worse, around 8%, but may lead to a slight increase in more realistic benchmarks that are impacted more by latency.
The trick to making this work is finding out, just where your memory loses stability. This can be very challenging because there are so many other factors involved, but suffice it to say, if you concentrate on CL and RCD, and find the best MHz/latency combination that gets these two latencies the lowest, then the rest just needs to be set as best you can get, because for the most part, their impact on latency is quite minimal. In order to more precisely determine at what points CL and RCD reach their stability point, it might help to leave one high, and try to tweak the other. Start with CL, leave RCD at something where it wont cause problems(6-7), along with any other timings, and try to find the lowest latency time(in nanoseconds not just cycles), that you can stability push the CL too. Then do the same with RCD, and this should give you a better idea of how _your_ memory performs. Remember, not all memory is equal, not even the same brand, same IC, heck, same weak/year and serial number are likely to perform different. Its a long slow process, but once you know where the limits are, then its easier to make judgment calls when trying to tune for better latency or better bandwidth.
Another thing, TAKE NOTES. I have about 12 pages worth of notes on clock timings/ speed and voltages from when i was tweaking, and learning what my memory was capable of. And I'm damn glad i did.
All in total i think for my CellShock DDR-1066 set, i have over 300 results of different timing/speed/voltage, and it has helped me immensely. Mostly because there are just way to many combinations of to try and remember, and having those notes to look back on makes things so much easier. And I'm guessing I've only been through tweaking about 20% of what i would like to test. I guess it all depends on your level of obsession
The biggest thing to remember from this, is that its not simply the clock latencies that you choose that determine overall latency. You also have to factor in the cycle time to know exactly what your latency times are, as clock cycles is a relevant timing dependent on cycle time. It may seem pretty obvious when stated like this, but i think often it gets overlooked when people try to 'tighten' timings by lowering the clock cycle with no regard as to what their speed is at.
The other part of the equation that i like to tune is the voltage. Many times i have seen others, and myself included, simply pump a bunch of voltage to the ram with little regard for using 'too much'. This might sound a bit off, but really, if you dont need 2.5v to be stable at some given setting, but instead 2.24v would suffice, then lower it. The thing about voltage, as you well may know, is that it increases heat.
Now, this part is speculation, i could well be wrong on this, I'll try to dig up some proof if i can find some. But I'm thinking also that increased voltage also can cause excessive leakage. If the leakage gets out of control, and the refresh timing isnt enough to keep up with the excess voltage, then this in itself could well cause stability issues that would not be seen at lower voltages, so long as you could still be stable at those voltages.
One way to counteract this, if it were true, would be to increase the refresh time. My setting allows 3.9/7.8/15.6 us timings, giving 256/128/64Khz refresh rates. Now say your tRFC was at 100ns, that would mean at a 7.8us refresh rate(128Khz) you would have about 12.8ms of refresh time per second, or 1.28% of your rams time would be spent refreshing. Now, lowering the tRFC would lower this value some, but it is not going to increase the amount of refreshing done. Its only giving the refresh more time to complete. If you were to increase the refresh rate to 3.9us, this would of course bring the refresh time up to 2.56%, which, depending on if you can get more frequency with that refresh rate, might outweigh the extra 1.28% performance hit in increased refresh rates. Of course, like any other latency, increased frequency, and lower subsequent tRFC values also lowers the % of time spent refreshing. Its really a last resort to try and tweak every last bit out of your ram, and might make the 5mhz you need. Its a long shot, but it shouldn't be overlooked.
The Obscure Stuff
I discussed a bit above in the advanced part about trying to determine the actual time your chips require to perform operations like row and column selection. Determining this can be alot of trial and error. But i thought i would discuss a bit about why these values are what they are. It may give a bit of insight for the initiated overclocker to want to find new ways of overcoming the latency walls of their current setup.
The act of accessing memory takes some time. Electrical signals have a speed at which they can move, with the absolute limit being the speed of light. This might seem like an insanely fast speed, but when your talking in terms of nanoseconds, even at the speed of light you cant go very far in a single ns. Before the row/column selection can begin, a memory address sent from the cpu must be deciphered into a chip, and then a row/column selection on that chip. In memory speak, this is called demultiplexing. Once the module knows which row/column to select, it can begin with the RAS, wait for RCD, then CAS and wait for CL. The reason for this wait between signals, is because it takes time for the signal to travel from the multiplexer to the chip and active the precise row/column. This kind of latency is called Wire RC Delay. Wire RC Delay accounts for the largest amount of the latency between sending a RAS/CAS signal, and when its actually ready for use. The topic of RC Delay is really quite advanced, and is dependent on the manufacturing process. Suffice to say that in the largest part, its what makes the difference between cheap ram, that can only slightly tighter than stock settings, and more expensive ram that can run much tighter and faster, without a loss of stability. The well known D9 Micron chips are some of the best chips because of their low RC Delay. Another part of the latency equation is called Bit-Line sensing. Because the charge held in the capacitor of a memory cell is so small (some 10k electrons), the signal must be amplified in order to make sense of whether it is a 0 or a 1. Then there is also the time required to send the data from the sense amplifier to the bus for transport to the cpu. In total, sending the RAS/CAS signal and processing the charge through the sense amplifier makes up approximately 75% of the latency. And the majority of this latency is from RC Delay.
Now you might be wondering, that's all well and great if your a ram manufacturer, but most of this is out of the hands of simple settings. And your right for the most part. Unless of course your willing to dive into the effects of RC (which stands for resistance/capacitance) on wire. Its really somewhat of a grey area as to what can be done to lower the RC Delay. Attempts have been made to ram to try and cool it to sub-zero to try and obtain the same effects that have been seen with cpus at sub-zero temps. But for the most part it really just wasn't worth it. But, that's not to see that you shouldn't at least keep your ram as cool as possible, pushing high frequencies, tight timings and high voltage, all contribute to the activity of the wires, and thus increase their heat, and heat adds resistance. Too bad we cant make ram from superconductive material and eliminate RC delay altogether
There's a couple of more things I'll add to this section once i find more information on them. Specifically the more obscure ram timings (Read2Read, Read2Write and such), also some stuff about drive strengths, which I'm actively researching.
Last edited: