tag:blogger.com,1999:blog-5246987755651065286.post9056541405735266332..comments2024-02-22T16:15:42.388-08:00Comments on cbloom rants: Oodle 2.8.6 Releasedcbloomhttp://www.blogger.com/profile/10714564834899413045noreply@blogger.comBlogger3125tag:blogger.com,1999:blog-5246987755651065286.post-31630741969115766392020-05-11T12:47:25.784-07:002020-05-11T12:47:25.784-07:00We actually this recently in production.
Cache co...We actually this recently in production.<br /><br />Cache contention is a real big problem for multi-threaded programs and it can crop up easily and have a huge effect.<br /><br />We had a random number generator with a global state that we were hitting from lots of threads. (we use a per-thread random number generator for real production work, but just to easily hack things up we still have the one with global state). The random number generator code itself is only a few instructions, and it's not called super often, so we didn't think twice about it, but profiling showed it was taking a huge disproportionate amount of time, something like 10% of total runtime for 0.1% of the instructions, just because the cache line with the random number global state had to be passed between all the cores that were fighting over it.<br /><br />Global variables that are read-write to multiple threads is a huge no-no for high performance code. You have to stay on top of it. (and watch out for "false sharing" where two variables you think are independent wind up on the same cache line and cause contention)cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-64769066395533827732020-05-10T21:56:48.631-07:002020-05-10T21:56:48.631-07:00It's actually like a 4x4 cores. There are 4 C...It's actually like a 4x4 cores. There are 4 CCX's, each with its own set of 4 cores and 16 MB of L3. Access to memory in another CCX will be higher latency than within your own. My tests are single threaded so it won't show this. Threading switching also won't show this significantly. The OS is pretty good at scheduling a thread back to the same core it was on if your system is not overloaded. Thread switches are also extremely rare (per second or per milli as opposed to per nano), and some cache misses after a thread switch is normal. Communication between CCX's is vastly faster than the old NUMA multi-processor days.<br /><br />Where you would see it is if you have all the cores running simultaneously and touching the same cache lines to share data. Then those cache lines would have to be constantly moving between CCX's.<br /><br />This is bad programming and everyone should stop doing it and learn how to write efficient code for the modern highly threaded world. For example lock/gate variables should always be on their own cache lines. Use SPSC queues heavily. Don't poll on shared variables. Use true waitable events, not poll loops. Thread the right size of work items. etc. etc.<br /><br />I don't see this being an issue for most real world work loads.<br /><br />Anyway, it's just win-win. The 3950 is so cheap you can just imagine you got a 4 core processor on a single CCX for Intel prices and they threw in 12 more cores for free.cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-43632571278926369262020-05-10T21:15:47.542-07:002020-05-10T21:15:47.542-07:00i'm currently on a i7-3xxx too, looking into R...i'm currently on a i7-3xxx too, looking into Ryzen 9 3xxx also. however recent info about inter-CCX (this chiplet thing) concerns me. The cache latency between core 1 and core 16 is far greater than between core 1 and core 2. I think some of these >8 core Ryzens are actually more like 2x6cores, or 2x8cores. So any thread that travels between those 2 sets of 6/8 pays a far higher cost. I was wondering if your tests take this into consideration. Or are your tests short enough (in time) that the probability of a thread being moved to another core not happening?AJhttps://www.blogger.com/profile/09090312912573665655noreply@blogger.com