tag:blogger.com,1999:blog-5246987755651065286.post1037043511516953277..comments2024-02-22T16:15:42.388-08:00Comments on cbloom rants: 01-18-11 - Hadamardcbloomhttp://www.blogger.com/profile/10714564834899413045noreply@blogger.comBlogger5125tag:blogger.com,1999:blog-5246987755651065286.post-30511747130554691542011-01-19T11:01:04.268-08:002011-01-19T11:01:04.268-08:00So this post was meant to just be a dump of some s...So this post was meant to just be a dump of some stuff I learned about Hadamard transforms.<br /><br />The H264 Frext 8x8 transform is designed to stay in 16 bit integers, and every coefficient has at most 2 bits on, so multiplies can be replaced with two shifts and an add. If multiplies were cheap you would use a different transform.<br /><br />BTW another interesting approach for designing approximate/fast integer transforms is to do a Hadamard first and then apply a few lifting-style corrections to munge the coefficients together.cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-38516916823039392642011-01-19T01:45:21.411-08:002011-01-19T01:45:21.411-08:00Even ARM7TDMI (that's 1994 tech!) has fast mul...Even ARM7TDMI (that's 1994 tech!) has fast muls (2-5 cycles; 2 if top 24 bits of multiplicand are all-0 or all-1, 3 if top 16 bits are all-0 or all-1, etc). For the small multipliers in BinDCT etc. (usually <=7 bits) that's plenty. Not something I'd worry about.ryghttps://www.blogger.com/profile/03031635656201499907noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-70471549007071542402011-01-18T23:55:04.740-08:002011-01-18T23:55:04.740-08:00ARM has free shifts, but newer implementations als...ARM has free shifts, but newer implementations also have fast multiplies, too. Anyway, I'm not particularly familiar with either ARM or implementing DCT, so enough from me.<br /><br />This paper looks really cool! It is going to make for some very interesting reading very soon.won3dhttps://www.blogger.com/profile/09787472194187459747noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-87457067993092320902011-01-18T23:10:28.367-08:002011-01-18T23:10:28.367-08:00BinDCT is nice for HW but in SW (given fast SIMD m...BinDCT is nice for HW but in SW (given fast SIMD multipliers) a few parallel muls+adds are usually better than lots of adds+shifts with long dependency chains.<br /><br />Also, after the original BinDCT papers, there's been some improvements. State of the art (as far as I know) in integer-only DCTs is ISO/IEC 23002-2. See <a href="http://www.reznik.org/papers/SPIE07_MPEG-C_IDCT.pdf" rel="nofollow">here</a> (paper) and <a href="http://www.reznik.org/software/ISO-IEC-23002-2.zip" rel="nofollow">here</a> (code).<br /><br />The really big win (both HW and SW) is reducing width of arithmetic needed. In HW, you can make each stage just as wide as it needs to be; in SW, if you never have to go >16 bits, that's very notable (needs less regs *and* saves a lot of ops).ryghttps://www.blogger.com/profile/03031635656201499907noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-85404390162682106392011-01-18T17:55:00.254-08:002011-01-18T17:55:00.254-08:00There's also BinDCTThere's also BinDCTwon3dhttps://www.blogger.com/profile/09787472194187459747noreply@blogger.com