B44: replace tables with math #2125
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
(Note: there's an alternative PR at #2126 -- much simpler, it simply makes tables be initialized on first use)
B44 compression is probably very rarely used, and even when it is, the tables are only ever used on channels marked as "linear" (
EXR_PERCEPTUALLY_LINEAR), which is not the default. So replace the tables with just math, which makes for much smaller code size, and cuts down size of the binary. This does cut down a megabyte of source code, and makesOpenEXRCore-4_0.dllbe smaller by 254 kilobytes (868->614KB).Just naïvely replacing tables with math makes it a bit slower, so to compensate this PR uses SIMD in several code paths (AVX2+F16C, SSE2, ARM NEON, regular C code). The math done in all of them is identical, and the actual
exp/logimplementations follow the same range reduction / polynomial approximation as the Highway SIMD library (I tried several others, including from OpenColorIO and OpenImageIO, but these were not producing bit-exact results compared to previous tables).Timings for exrmetrics on one B44 image
Testing this https://aras-p.info/files/exr_files/Blender281rgb16_lin.exr image (3840x2160, RGB FP16, channels marked as "linear"), using
exrmetrics -m -z b44 --time write,reread --passes 50 --csv, times printed in milliseconds:("PC" is Ryzen 5950X, Windows / MSVC 2022 v17.14.12. "Mac" is MacBookPro M4 Max, Xcode 16.1)
This measures duration of whole compression/decompression, and while with no SIMD there is quite some slowdown, with the SIMD paths there's barely any overhead from using math instead of lookup table (exception is Mac "write" case, but then on this Mac the times are crazy low to begin with; I guess due to extremely large memory bandwidth that it has).
Timings for just the math/lookup comparison
Excluding everything else going on in B44 compression, here's times in milliseconds (one thread), to do
to_linearandfrom_linearon 160 million numbers:This shows that expectedly, even with SIMD the "do the math" approach is about 2x slower than doing a table lookup, and several times more slower if not using SIMD. However, again see above: B44 compression seems to be more limited by memory bandwidth, that doing this extra math does not slow down things much, if at all.
And again, all of this only affects B44/B44A compression, and only when image channels are marked as "linear" (which is not the default setting for
ImfChannel). I did not actually find any B44+Linear images anywhere, had to make my own using code.