B44: replace tables with math #2125

aras-p · 2025-09-08T14:26:45Z

(Note: there's an alternative PR at #2126 -- much simpler, it simply makes tables be initialized on first use)

B44 compression is probably very rarely used, and even when it is, the tables are only ever used on channels marked as "linear" (EXR_PERCEPTUALLY_LINEAR), which is not the default. So replace the tables with just math, which makes for much smaller code size, and cuts down size of the binary. This does cut down a megabyte of source code, and makes OpenEXRCore-4_0.dll be smaller by 254 kilobytes (868->614KB).

Just naïvely replacing tables with math makes it a bit slower, so to compensate this PR uses SIMD in several code paths (AVX2+F16C, SSE2, ARM NEON, regular C code). The math done in all of them is identical, and the actual exp / log implementations follow the same range reduction / polynomial approximation as the Highway SIMD library (I tried several others, including from OpenColorIO and OpenImageIO, but these were not producing bit-exact results compared to previous tables).

Timings for exrmetrics on one B44 image

Testing this https://aras-p.info/files/exr_files/Blender281rgb16_lin.exr image (3840x2160, RGB FP16, channels marked as "linear"), using exrmetrics -m -z b44 --time write,reread --passes 50 --csv, times printed in milliseconds:

Case	PC write time	PC reread time	Mac write time	Mac reread time
Lookup tables (main branch)	20.7	14.1	3.6	4.3
This PR, best SIMD (AVX2 or NEON)	20.3	14.5	6.1	4.8
This PR, SSE2 only	22.7	17.6	-	-
This PR, no SIMD	29.7	21.0	7.8	8.3

("PC" is Ryzen 5950X, Windows / MSVC 2022 v17.14.12. "Mac" is MacBookPro M4 Max, Xcode 16.1)

This measures duration of whole compression/decompression, and while with no SIMD there is quite some slowdown, with the SIMD paths there's barely any overhead from using math instead of lookup table (exception is Mac "write" case, but then on this Mac the times are crazy low to begin with; I guess due to extremely large memory bandwidth that it has).

Timings for just the math/lookup comparison

Excluding everything else going on in B44 compression, here's times in milliseconds (one thread), to do to_linear and from_linear on 160 million numbers:

Case	PC to_linear	PC from_linear	Mac to_linear	Mac from_linear
Lookup tables (main branch)	64	65	43	41
This PR, best SIMD (AVX2 or NEON)	173	86	117	107
This PR, SSE2 only	472	377	-	-
This PR, no SIMD	1041	568	475	264

This shows that expectedly, even with SIMD the "do the math" approach is about 2x slower than doing a table lookup, and several times more slower if not using SIMD. However, again see above: B44 compression seems to be more limited by memory bandwidth, that doing this extra math does not slow down things much, if at all.

And again, all of this only affects B44/B44A compression, and only when image channels are marked as "linear" (which is not the default setting for ImfChannel). I did not actually find any B44+Linear images anywhere, had to make my own using code.

B44 compression is probably very rarely used, and even when it is, the table is only ever used on channels marked as "linear" (EXR_PERCEPTUALLY_LINEAR), which is not the default. This does make B44 compression with "linear" channels slower: compressing 4K resolution image on Ryzen 5950X / Windows / VS2022, exrmetrics write time goes 0.081 -> 0.210. The math is always done on 16 value chunks and should be possible to SIMD it, but not sure if worth it. This does cut down half a megabyte of source code, and makes OpenEXRCore-4_0.dll be smaller by 126 kilobytes (868->742KB) Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>

B44 compression is probably very rarely used, and even when it is, the table is only ever used on channels marked as "linear" (EXR_PERCEPTUALLY_LINEAR), which is not the default. This does make B44 decompression with "linear" channels slower: decompressing 4K resolution image on Ryzen 5950X / Windows / VS2022, exrmetrics re-read time goes 0.038 -> 0.157. The math is always done on 16 value chunks and should be possible to SIMD it, but not sure if worth it. This does cut down half a megabyte of source code, and makes OpenEXRCore-4_0.dll be smaller by 128 kilobytes (742->614KB) Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>

Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>

…r_16 AVX2+F16C, SSE2, NEON Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>

meshula

I prefer the math approach overall, however I'm marking both of these as approved from my point of view.

aras-p added 3 commits September 8, 2025 19:44

Remove unused b44ExpLogTable.cpp

98d1a50

Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>

aras-p force-pushed the b44-tables branch from d094940 to 98d1a50 Compare September 8, 2025 16:45

B44: SIMD code paths in b44_convertFromLinear_16 / b44_convertToLinea…

179792c

…r_16 AVX2+F16C, SSE2, NEON Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>

aras-p force-pushed the b44-tables branch from 3df4123 to 179792c Compare September 10, 2025 13:14

aras-p mentioned this pull request Sep 11, 2025

B44: initialize exp/log tables at runtime #2126

Open

meshula approved these changes Sep 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

B44: replace tables with math #2125

B44: replace tables with math #2125

Uh oh!

aras-p commented Sep 8, 2025 •

edited

Loading

Uh oh!

meshula left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

B44: replace tables with math #2125

Are you sure you want to change the base?

B44: replace tables with math #2125

Uh oh!

Conversation

aras-p commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Timings for exrmetrics on one B44 image

Timings for just the math/lookup comparison

Uh oh!

meshula left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aras-p commented Sep 8, 2025 •

edited

Loading