tag:blogger.com,1999:blog-5246987755651065286.post4310107746190338631..comments2024-02-22T16:15:42.388-08:00Comments on cbloom rants: How Oodle Kraken and Oodle Texture supercharge the IO system of the Sony PS5cbloomhttp://www.blogger.com/profile/10714564834899413045noreply@blogger.comBlogger20125tag:blogger.com,1999:blog-5246987755651065286.post-2870093282005847462021-01-27T08:23:09.804-08:002021-01-27T08:23:09.804-08:00"Regarding the Kraken decomp chip, I thought ..."Regarding the Kraken decomp chip, I thought Kraken was quickly surpassed and superseded by your team's subsequent codecs. (Selkie?)"<br /><br />No, Kraken is still very much state-of-the-art. All the new CryptoOceanicZoology codecs offer different trade offs of speed vs compression vs complexity. Kraken is also very tuneable for different applications.<br /><br />"Or is it not optimized for Kraken specifically?"<br /><br />Oodle Texture is not heavily Kraken-specific. It works well with any back-end compressor.<br /><br />"Is Kraken more about decomp speed than compression ratio?"<br /><br />Kraken is very tuneable. It is about space+speed not one or the other. It usually beats ZStd at both.<br /><br />"Do you think hardware decompression could be just as easily applied to PCs and smartphones for the same wins?"<br /><br />That's probably coming in the future. I think in general in computing we'll see more custom chips for various tasks because they provide superior performance per watt vs generalized computing.<br /><br />Most apps are not written to take advantage of fast IO, so there's a lot of work to do on the software levels above the hardware to see the benefit (from device drivers to memory mapping, to OS file buffers, to the way apps request and read data (more async and bigger chunks please!)).<br /><br />"Have you looked at a combined decompression and decryption pipeline?"<br /><br />They're separate tasks, but it does make sense to put them on the same chiplet so that they can both act on the same local cached buffer rather than pulling memory through two different units. Maybe we'll see both decryption and decompression in a unified IO controller?cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-56103593697534358272021-01-27T07:19:07.471-08:002021-01-27T07:19:07.471-08:00Hi Charles, congratulations! I'm very impresse...Hi Charles, congratulations! I'm very impressed with what you've accomplished as an engineer/programmer/applied mathematician. It's great to see people like you succeeding and being rewarded for your skills.<br /><br />Regarding the Kraken decomp chip, I thought Kraken was quickly surpassed and superseded by your team's subsequent codecs. (Selkie?) Am I remembering this wrong? My impression was that Kraken was a sort of old, first stab at your end goals, and that you blew it out of the water with follow-on development.<br /><br />Relatedly, I'm stumped by the results in your table, with Kraken + Oodle Texture at 3.16, vs 2.69 for Kraken + ZIP. I would've expected a bigger difference, even if Oodle Texture wasn't specifically optimized or tuned for Kraken. Or is it not optimized for Kraken specifically? I'd expect something like Zstd or brotli to beat ZIP by that margin, and Kraken to be somewhat better than them. Is Kraken more about decomp speed than compression ratio?<br /><br />These new consoles are a revelation in terms of I/O architecture and throughput. They seem to be much more powerful than PCs in ways I didn't expect. Do you think hardware decompression could be just as easily applied to PCs and smartphones for the same wins? I'm fascinated by hardware implementations like that, in part because I know so little about them – how exactly code is instantiated in hardware at a low level and so forth. Have you looked at a combined decompression and decryption pipeline? I guess decryption would have to happen first, followed by decompression, unless I'm missing something. I wonder if there are possible approaches for combining encryption and compression into a unified codec, or at least very complementary or synergistic codecs. Something like BitLocker or Opal combined with an Oodle codec would be super.Joe Duartehttps://www.blogger.com/profile/14296929728589283424noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-32980193449075038902020-11-21T23:31:43.754-08:002020-11-21T23:31:43.754-08:00The HW was designed mainly by Sony and AMD. We did...The HW was designed mainly by Sony and AMD. We did get looped in once the architecture was settled to assist with validation (making sure the design works right), tooling, and add some functionality in our SW for use by the PS5 SDK.<br /><br />We emphatically did not consider the rest of the console HW in any way because we didn't need to know and weren't briefed on it until last year, long after the design was final (the pipeline for mass-market HW is quite long). Nor did we know anything about there being a SSD, SSD throughputs or clock rates. All we knew was that the design had ambitious targets for the number of bytes decoded per cycle.<br /><br />We can't comment on HW specifics or make comparisons, that's all covered by NDAs.Fabian 'ryg' Giesenhttps://www.blogger.com/profile/13685994980026854143noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-35131219224636172222020-11-21T13:42:35.519-08:002020-11-21T13:42:35.519-08:00Im so glad I found this blog! This is a great disc...Im so glad I found this blog! This is a great discussion! This comment was interesting:<br /><br /><i>"cbloom said...<br />PS5 is the only system with a hardware Kraken decoder, and the only platform with platform-wide license to Oodle Texture so that every game can use it. In theory PC SSD's will keep getting faster, but you would need several CPU cores running software Kraken to match the decompressed bandwidth of the PS5 hardware Kraken. Even then, a typical game on the PC won't be able to achieve that IO speed because of other bottlenecks; once you're going that fast lots of other things in the system software can become problems, you have to address it all through the software."</i><br /><br />While true, the "lots of other things in the system software" amounts to the work Microsoft is doing in DirectStorage on XBSX, and in porting to PC DX12, and the work they've done on BCPack. RAD is awesome, Oodle is super impressive, and the approach Sony has taken here from both a systems architecture, and developer experience, standpoint is really elegant. But all that said, I really don't think we're going to see gigantic real world differences in experience between the PS5 and XBSX/PC (which I think is what many of the comments are fishing for)<br /><br />Even today, just optimizing old game engines to be aware of, and utilize, NVMe already makes a *noticeable* difference on PC, and that's before *any* updates to DX12 or any leveraging of more advanced decompression pipelines by "yet to be built" game engines.The Extreme Moderatehttps://www.blogger.com/profile/12111522578453453534noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-69803965106966021862020-11-21T00:10:59.145-08:002020-11-21T00:10:59.145-08:00Were you guys consulted by Sony regarding the desi...Were you guys consulted by Sony regarding the design of the decompression hardware or was just internal decision-making? Do you consider the jump in hardware of the console and particularly the 16GB of RAM enough to bring photorealistic graphics a step closer considering games now have to render in dynamic 4K? I assume RAM usage will be now much more efficient and dedicated to what you're currently seeing on screen and not idle but I wonder if most of the new processing power won't go to the need to render in dynamic 4K.<br /><br />And if I may, could you comment on how does the PS5 I/O unit compare with the Series X?<br /><br />Thank you guys for the answers. It's really interesting.Teodoro Gripínhttps://www.blogger.com/profile/04586871333774947323noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-48017702428549767472020-11-20T12:12:49.868-08:002020-11-20T12:12:49.868-08:00The decompressor acts as a speed multiplier depend...The decompressor acts as a speed multiplier depending on the compression ratio. The input speed to the decompressor is always the same, determined by the disk rate, but the output speed varies. Game content doesn't just have a single uniform compression ratio, it will be a mix of content, some of which compresses better than others, so some decompresses near the min speed, and some much faster. When we talk about the overall game speed or compression ratio, that's an average. Also the average is done on time, not speed (which is the inverse of time), so for example the average of 5 GB/s and 20 GB/s is 8.<br /><br />As for the load times - basically yes, load times will be ridiculously fast in games that are designed for it, in fact we should see games in the future that have zero visible load time at all, you just jump right in to huge levels and never experience any load time.cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-57048310886055357622020-11-19T07:23:24.026-08:002020-11-19T07:23:24.026-08:00Thanks for your responses. I really appreciate hav...Thanks for your responses. I really appreciate having dialogue with someone who actually knows what they're talking about and doesn't mind taking the time to teach those of us that are still learning. <br />That's really interesting and makes a lot of sense. Did Sony anticipate this technology coming which is why they included a 22 GB/s decompressor despite announcing the 8-9 GB/s number? It seemed like overkill at the time to me but this makes me think they knew exactly what they were doing. Also wouldn't this mean that once it is implemented loading times will rarely be more than half a second on well-optimized games since the PS5 has 13 GBs of usable RAM to fill max?Packersfan1290https://www.blogger.com/profile/18315814633479230783noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-62451301378623842622020-11-19T07:04:03.321-08:002020-11-19T07:04:03.321-08:00I can't comment on specific games, but the who...I can't comment on specific games, but the whole combination of Oodle Texture + Kraken + a well optimized loading pipeline probably won't be in a shipping game on PS5 for a while yet.<br /><br />There are games rolling out Oodle Texture now on PC, such as Warframe :<br /><br />https://forums.warframe.com/topic/1223735-the-great-ensmallening/<br /><br /><br />I've seen some press trying to compare load times of cross-platform games on the various systems. While that is tempting to think it's a scientific way to compare something equal across systems, it doesn't give a very accurate picture of what's actually happening. Cross platform games rarely have the time to carefully design their IO system to be optimal on all platforms, especially early launch titles where the pressure to get anything shipped is difficult enough. Many games load stack is CPU bound, which means you aren't really seeing the IO subsystem performance at all, and if you drive the IO system in a generic cross-platform way it probably isn't at peak performance. It's like taking a bunch of cars to a test track but giving them all bald tires in the rain, the higher performance cars won't be able to do a better lap time so you won't really see the difference in what they're capable of.cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-66848850727368714372020-11-19T06:20:10.296-08:002020-11-19T06:20:10.296-08:00When will the first games that use all of this new...When will the first games that use all of this new technology be released?Packersfan1290https://www.blogger.com/profile/18315814633479230783noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-38601796024811916742020-11-12T07:41:13.360-08:002020-11-12T07:41:13.360-08:00This blog is nerd-elicious! Looking forward to mor...This blog is nerd-elicious! Looking forward to more entry posts regarding PS5 and the development of its capabilities. Thank you guys so much for sharing all this!Teodoro Gripínhttps://www.blogger.com/profile/04586871333774947323noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-18910642197881985872020-11-09T03:06:48.555-08:002020-11-09T03:06:48.555-08:00I see. Thanks for answering my question & than...I see. Thanks for answering my question & thank you as well Fabian for answering my question. So you mentioned ps5 being the only platform with licensing to be used on every game, so the only way to use Oodle texture would be through licensing? Micky Guyotahttps://www.blogger.com/profile/02814502111234755487noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-25192901243864793012020-11-07T18:10:21.598-08:002020-11-07T18:10:21.598-08:00svpv: We can't answer questions about the deta...svpv: We can't answer questions about the details of the HW implementation because it's Sony's, not ours.<br /><br />Micky: the design was focused on maximizing <i>minimum</i> decode speed; that is, making sure that even for data that is pathologically slow to decode, the decoder keeps up with (or ideally outpaces) the peak SSD read speeds.<br /><br />I'm happy the peak decompression speed came out the way it did but we always knew it was going to be high and didn't worry about it much.Fabian 'ryg' Giesenhttps://www.blogger.com/profile/13685994980026854143noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-47179914825749747732020-11-07T07:25:35.650-08:002020-11-07T07:25:35.650-08:00PS5 is the only system with a hardware Kraken deco...PS5 is the only system with a hardware Kraken decoder, and the only platform with platform-wide license to Oodle Texture so that every game can use it. In theory PC SSD's will keep getting faster, but you would need several CPU cores running software Kraken to match the decompressed bandwidth of the PS5 hardware Kraken. Even then, a typical game on the PC won't be able to achieve that IO speed because of other bottlenecks; once you're going that fast lots of other things in the system software can become problems, you have to address it all through the software.cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-79200027327059431712020-11-06T22:40:39.896-08:002020-11-06T22:40:39.896-08:00Is ps5 the only platform that has Hardware Decompr...Is ps5 the only platform that has Hardware Decompression for kraken + oodle? Or does pc & xbox series x/s have it as well? If so can they reach the ps5 theoretical I/O throughput speeds?(I’ve seen multiple reports claiming it’s peak is 22GBs, some say even more than that), also, the only way to attain such I/O speeds, does that require having the same ssd speeds as ps5? Sorry if my question doesn’t make any sense, I am not a tech expert, just an enthusiast trying to learn as much as I can. Micky Guyotahttps://www.blogger.com/profile/02814502111234755487noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-37247525278349759762020-09-30T21:30:35.705-07:002020-09-30T21:30:35.705-07:00And what's the maximum window size supported b...And what's the maximum window size supported by the hardware decoder? Is there a dedicated SRAM array for the data in the window?svpvhttps://www.blogger.com/profile/02492361839266254177noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-64174459332893481102020-09-30T19:56:01.416-07:002020-09-30T19:56:01.416-07:00(I also work at RAD on Oodle.)
The Kraken decoder...(I also work at RAD on Oodle.)<br /><br />The Kraken decoders are not "equivalent to 9 Zen 2 cores", that's quoting wildly out of context; by the same rationale a Deflate decoder that hits 5-6 GB/s output would be "equivalent to 12 Zen 2 cores" which is just as misleading. That ratio is just meaningless. They're dedicated fixed-function hardware that does one specific task (that happens to be suitable for HW implementation), certainly not equivalent substitutes for a general-purpose CPU core. If they were, that'd be missing the point.<br /><br />Both PS5 and Xbox Series X decided to go for HW decompression because they noticed that with a SSD, decompression goes from a side task for one CPU core to a full-time job for several, at which point it makes sense to design dedicated hardware. Once you decide to go there, you have considerable freedom in how you design decompression units, how they are clocked, how many there are, etc., and you configure all that to meet your targets.<br /><br />In the PS5 case, the goal was for the decompressors to never be the bottleneck in real workloads, so they're dialed in to be fast enough to keep up with the SSD at all times, with a decent safety margin. That's all there is to it.<br /><br />Along the same lines, 2 helper processors in an IO block that has both a full Flash controller and the decompression/memory mapping/etc. units is not by itself remarkable. Every SSD controller has one. That's what processes the SATA/NVMe commands, does the wear leveling, bad block remapping and so forth. The special part is not that these processors exist, but rather that they run custom firmware that implements a protocol and feature set quite different from what you would get in an off-the-shelf SSD.Fabian 'ryg' Giesenhttps://www.blogger.com/profile/13685994980026854143noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-58314631015213181552020-09-30T11:47:54.509-07:002020-09-30T11:47:54.509-07:00I don't think there's any public compariso...I don't think there's any public comparison of Kraken and BCPack. If there was one an unofficial one around, beware it might not include the affects of Oodle Texture. Oodle Texture dramatically changes the way textures compress; we believe it should always be used with textures in games.<br /><br />We're big fans of the Xbox Series X. Their approach is slightly different, but we're glad to see they are taking compression seriously.<br /><br />Oodle Texture works great for the Xbox as well, and we are working with a lot of game companies that are using it on Xbox, so consumers should see lots of games with those huge size and speed savings on Xbox as well. It's up to the individual game developers, as it's not been licensed platform-wide at this time.<br /><br />Game developers are also using Oodle Texture and Oodle Kraken for PC games; most multi-platform devs will be using the same Oodle Texture encoding of their textures for all platforms, it's not platform-specific. On the PC you don't have hardware Kraken, so software Kraken is used on the CPU. To keep up with the fastest SSD speeds this requires several cores; luckily high end PC's also have lots of CPU cores!<br /><br />At RAD we've always just tried to make the best compression possible for games. We plan to continue to work with all platforms in the future.<br />cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-18166980530564636822020-09-29T22:36:53.589-07:002020-09-29T22:36:53.589-07:00I don't think its on the same level as kraken ...I don't think its on the same level as kraken is built into the ps5 remember when cerny stated that its power was equivalent to 9 zen 2 cores and has dual processors to control the i/o throughput, my guess they did this to free compression being done by the CPU/GPU it basically does it itself, some people say its better than BC pack can allow the ps5 17.38GB bandwidth which possibly is 3x faster than the xbox series x.<br />Moores law is dead recently uploaded a video to talk about oodle kraken around the 1:34:00 mark if you want to have a listen but it does sound like this tech is efficient.Anonymoushttps://www.blogger.com/profile/17682816208570529921noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-55026006242252781692020-09-28T06:14:35.630-07:002020-09-28T06:14:35.630-07:00Is there any comparison between Kraken + Oodle Tex...Is there any comparison between Kraken + Oodle Texture and BCPack that you’re aware of? For example, is BCPack CPU dependent or is it fundamentally the same as Kraken? etc...jhttps://www.blogger.com/profile/15486256728182994254noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-70036322344333454612020-09-26T09:20:01.962-07:002020-09-26T09:20:01.962-07:00Thank you for this informative rant.Thank you for this informative rant.Roadwarriorhttps://www.blogger.com/profile/14477243344728809414noreply@blogger.com