• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

How CPU Cache works

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

musawi

Member
Joined
Jul 29, 2004
Location
Bahrain
I was wondering how cache worked so I did my research and came up with is, and decided to post it to the people who didn't know what it is and how it works.


What is the CPU Cache?

The cache on your CPU has become a very important part of today's computing. The cache is a very high speed and very expensive piece of memory, which is used to speed up the memory retrieval process. Due to its expensive CPU's come with a relatively small amount of cache compared with the main system memory. Budget CPU's have even less cache, this is the main way that the top processor manufacturers take the cost out of their budget CPU's.

How does the CPU Cache work?

Without the cache memory every time the CPU requested data it would send a request to the main memory which would then be sent back across the memory bus to the CPU. This is a slow process in computing terms. The idea of the cache is that this extremely fast memory would store and data that is frequently accessed and also if possible the data that is around it. This is to achieve the quickest possible response time to the CPU. Its based on playing the percentages. If a certain piece of data has been requested 5 times before, its likely that this specific piece of data will be required again and so is stored in the cache memory.

Lets take a library as an example o how caching works. Imagine a large library but with only one librarian (the standard one CPU setup). The first person comes into the library and asks for Lord of the Rings. The librarian goes off follows the path to the bookshelves (Memory Bus) retrieves the book and gives it to the person. The book is returned to the library once its finished with. Now without cache the book would be returned to the shelf. When the next person arrives and asks for Lord of the Rings, the same process happens and takes the same amount of time.

If this library had a cache system then once the book was returned it would have been put on a shelf at the librarians desk. This way once the second person comes in and asks for Lord of the Rings, the librarian only has to reach down to the shelf and retrieve the book. This significantly reduces the time it takes to retrieve the book. Back to computing this is the same idea, the data in the cache is retrieved much quicker. The computer uses its logic to determine which data is the most frequently accessed and keeps them books on the shelf so to speak.

That is a one level cache system which is used in most hard drives and other components. CPU's however use a 2 level cache system. The principles are the same. The level 1 cache is the fastest and smallest memory, level 2 cache is larger and slightly slower but still smaller and faster than the main memory. Going back to the library, when Lord of the Rings is returned this time it will be stored on the shelf. This time the library gets busy and lots of other books are returned and the shelf soon fills up. Lord of the Rings hasn't been taken out for a while and so gets taken off the shelf and put into a bookcase behind the desk. The bookcase is still closer than the rest of the library and still quick to get to. Now when the next person come in asking for Lord of the Rings, the librarian will firstly look on the shelf and see that the book isn't there. They will then proceed to the bookcase to see if the book is in there. This is the same for CPU's. They check the L1 cache first and then check the L2 cache for the data they require.

Is more Cache always better?

The answer is mostly yes but certainly not always. The main problem with having too much cache memory is that the CPU will always check the cache memory before the main system memory. Looking at our library again as an example. If 20 different people come into the library all after different books that haven't been taken out in quite a while but the library has been busy before and so the shelf and the bookcase are both full we have a problem. Each time a person asks for a book the librarian will check the shelf and then check the bookcase before realising that the book has to be in the main library. The librarian each time then trots off to get the book from the library. If this library had a non cache system it would actually be quicker in this instance because the librarian would go straight to the book in the main library instead of checking the shelf and the bookcase.

As the fact that non cache systems only work in certain circumstances and so in certain applications CPU's are definitely better with a decent amount of cache. Applications such as MPEG encoders are not good cache users because they have a constant stream of completely different data.

Does cache only store frequently accessed data?

If the cache memory has space it will store data that is close to that of the frequently accessed data. Looking back again to our library. If the first person of the day comes into the library and takes out Lord of the Rings, the intelligent librarian may well place Lord of the Rings part II on the shelf. In this case when the person brings back the book, there is a good chance that they will ask for Lord of the Rings part II. As this will happen more times than not. It was well worth the Librarian going to fetch the second part of the book in case it was required.

Cache Hit and Cache Miss

Cache hit and cache miss are just simple terms for the accuracy of what goes into the CPU's cache. If the CPU accesses its cache looking for data it will either find it or it wont. If the CPU finds what's its after that's called a cache hit. If it has to go to main memory to find it then that is called a cache miss. The percentage of hits from the overall cache requests is called the hit rate. You will be wanting to get this as high as possible for best performance.

Can you increase Cache Performance?

No

Developers can use compilers optomized for various architectures to make maximum utilization of the cache, but as an end user you cannot.

The Two Different Buses

The Front Side Bus (FSB) is the connection between the CPU to the main memory and it is just a fraction of the CPU clock speed.

The Back side bus is the connection between the CPU and L2 Cache. The BSB is found within the processor and the speed is determined by the clock speed.

So infact raising the FSB, raises the clockspeed therefore increasing the speed that the CPU connects with the L2 Cache.


I decided to post this since I really felt it made it very clear to what it was and how it worked. I got it from this website and I just wanted to share it.
 
Last edited:
Thank you for your post musawi.

I have a question...
Is there a way I can speed up the proccess in which "the librarian looks through the bookshelf".
 
1000TT said:
Thank you for your post musawi.

I have a question...
Is there a way I can speed up the proccess in which "the librarian looks through the bookshelf".
No, sorry.

Locality and repetitive access really can't be affected at the user level; good, judicious programming and optomization at the compiler level can help, but there's nothing you as a user can do to improve cache performance.

A note--we also have L3 caches, and L4 caches, ... The reason for caches is very, very simple: The principle of memory hierarchy states that the larger a volume gets, the slower it is to access it, and the larger a volume of memory gets, the more the cost goes up to access it at the rate of a smaller volume.
 
Thanks.

I have another question, You know how musawi said that the librarian gets Lord of the rings two after some-one checked out Lord of the Rings 1

Is it like browsing through folders? When you open one teh chances are your gonna open the next?
If so lets say you have 5 subfolders... how will the CPU know which one your going to access? does it get the most accessed one, or simply all 5?

Do you understand what I mean?

Question: Is there a way you can delete the number of "cache hits"?
 
1000TT said:
Thanks.

I have another question, You know how musawi said that the librarian gets Lord of the rings two after some-one checked out Lord of the Rings 1

Is it like browsing through folders? When you open one teh chances are your gonna open the next?
If so lets say you have 5 subfolders... how will the CPU know which one your going to access? does it get the most accessed one, or simply all 5?

Do you understand what I mean?

Question: Is there a way you can delete the number of "cache hits"?
I think it bears repating--THERE IS NOTHING THAT YOU AS AN END USER CAN DO TO AFFECT CACHE PERFORMANCE.

CPU cache controllers use both regularity of access and spatial locality in determining what should be loaded into the cache. If a program references memory location A seventeen times in a procedure and the CPU's requested it twice already, it's a damn good bet that the CPU will request it again and so the cache controller will fetch the data at A in a quiet moment. It will also fetch data around A (A+sizeof(word), for instance) and thus the LOTR analogy.

Memory is sequentially addressed; there are no "folders" to speak of at the hardware level.
 
That is a good article. I was just doing a search on this myself the other day. Just a question though. I'm looking at two P4 M cpus. One with 1Mb of cache the other with 2Mb of cache. Will I see a performance boost worth 60 bucks with a bigger cache. It's a 1.4 with the 1Mb and a 1.6A with the 2Mb of cache so there is a price difference as well. I'll put in my vote for sticky too. It's good info that is harder to find online if you want the info quickly. I couldn't find it because of the huge threads that most forums have. Hard to dig through.
 
enduro said:
I'm looking at two P4 M cpus. One with 1Mb of cache the other with 2Mb of cache. Will I see a performance boost worth 60 bucks with a bigger cache. It's a 1.4 with the 1Mb and a 1.6A with the 2Mb of cache so there is a price difference as well.

The 1.6 is faster by 200Mhz and its got 1mb more cache so basically it would run faster, but the real question is whether you would actually feel the difference from the 1.4 and if it is worth the extra 60 bucks.

It also depends what you do as a user , like it says in the first post MPEG Encoding wont do you good cause because MPEG Encoding "have a constant stream of completely different data."

But I would say its worth the extra 60 bucks and I would go for it because I don't do any encoding or stuff like that.

That is my opinion, it might be different to anybody else or yourself
 
Why does it seem amd processors preform better with low cache then intel ones? IE celerons vs semprons.

I build semprons and celerons all day long celeron 2.4ghz vs a sempron 2300's and the semprons with same hd ram seem to just run smoother than the intel setups?
 
The main problem with having too much cache memory is that the CPU will always check the cache memory before the main system memory. Looking at our library again as an example. If 20 different people come into the library all after different books that haven't been taken out in quite a while but the library has been busy before and so the shelf and the bookcase are both full we have a problem. Each time a person asks for a book the librarian will check the shelf and then check the bookcase before realising that the book has to be in the main library. The librarian each time then trots off to get the book from the library. If this library had a non cache system it would actually be quicker in this instance because the librarian would go straight to the book in the main library instead of checking the shelf and the bookcase.
 
Ramlaen said:
Why does it seem amd processors preform better with low cache then intel ones? IE celerons vs semprons.

I build semprons and celerons all day long celeron 2.4ghz vs a sempron 2300's and the semprons with same hd ram seem to just run smoother than the intel setups?

I belive its because amd`s have shorter pipelines so large cache isnt as important, as P4/celerons have longer pipes larger cache is needed thats why the celeron performes badly as it has less cache then a P4.
Thats as best as i can explain it with my limited knowledge.

First time i have been in this section with two great posts.
Doesnt raising the cpu`s clockspeep also raise the L2 cache frequency making the librarian walk faster?
 
Last edited:
great post... very informative... jenko, im probably wrong but I thought that raising the clockspeed only make the data get processed faster... raising the fsb(well in some cases raising clockspeed and fsb are the same, but when you get into multipliers and stuff....) might make the librarian faster? can someone that acutally knows something explain?
 
When you overclock you are raising the Front Side Bus (FSB) and the FSB is the connection between the CPU to the main memory and it is just a fraction of the CPU clock speed.

The Back side bus is the connection between the CPU and L2 Cache. The BSB is found within the processor and the speed is determined by the clock speed.

So infact raising the FSB, raises the clockspeed therefore increasing the speed that the CPU connects with the L2 Cache.

I might be very wrong! But I just want to know if what I said above is entirely true or not.
 
You're right.

I would edit the original post to qualify the "Can I change performance" to a qualified no. Developers can use compilers optomized for various architectures to make maximum utilization of the cache, but as an end user you cannot.

I would be inclined also to add a dissertation about the various busses that the system uses in communicating everwhere.
 
Well, IIRC there were a few (old) chips where you could increase the performance of the L2 cache by tweaking it's latency, but no modern chip to my knowledge will let you change it. That would probably be the only method the end-user could hope to increase cache performance, and like I said, it's no longer available to tweak.

JigPu
 
Back