tag:blogger.com,1999:blog-5246987755651065286.post1202782195408742454..comments2024-02-22T16:15:42.388-08:00Comments on cbloom rants: 08-06-10 - More SPUcbloomhttp://www.blogger.com/profile/10714564834899413045noreply@blogger.comBlogger3125tag:blogger.com,1999:blog-5246987755651065286.post-62592170222337297802010-08-09T00:28:02.504-07:002010-08-09T00:28:02.504-07:00" Maybe you should try splitting loads and ca..." Maybe you should try splitting loads and calculations (http://cellperformance.beyond3d.com/articles/2006/04/a-practical-gcc-trick-to-use-during-optimization.html)"<br /><br />To be clear : Mike's post is about using the GCC scheduling trick to make it easier to read the assembly so that you can see what's going on and then optimize it.<br /><br />That's what I was doing when I discovered that it had randomly gotten faster.cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-65848258054742326492010-08-08T23:08:40.085-07:002010-08-08T23:08:40.085-07:00"Right now my code is 5-10% faster with a few..."Right now my code is 5-10% faster with a few scheduling barriers manually inserted (19 clocks per symbol vs 21). That's fucking bananas, it means the compiler's scheduler is fucking up somehow."<br />The GCC scheduler isn't very good.<br /><br />There's SPA (SPU Pipelining Assembler) which does pretty good register allocation, unrolling and scheduling (software pipelining to work around latencies). But it doesn't integrate well with C code at all, if you want to try it, you basically have to port that function and everything below it to SPA, which is a maintenance nightmare for code that you occasionally want to touch. It's particularly bad if you're targeting multiple platforms so you absolutely positively need the C++ code version alongside.<br /><br />That said, once you've distilled your code down to relatively simple loops, it works a lot better to just write it in ASM without worrying about scheduling and unrolling, then let SPA figure out the rest. It's a lot less frustrating than tweaking C++ code in obscure ways to get the compiler to do what you want, anyway.ryghttps://www.blogger.com/profile/03031635656201499907noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-16448666078131521712010-08-08T22:57:02.706-07:002010-08-08T22:57:02.706-07:006. Maybe you should try splitting loads and calcul...6. Maybe you should try splitting loads and calculations (http://cellperformance.beyond3d.com/articles/2006/04/a-practical-gcc-trick-to-use-during-optimization.html)? Although from my small experience, SPU compiler is so bad, that any random and strange code change can lead to increased performance :).Krzysztof Narkowiczhttps://www.blogger.com/profile/01055035127272189489noreply@blogger.com