Skip to content

Conversation

@sayantn
Copy link
Contributor

@sayantn sayantn commented Nov 10, 2025

Retrying #1928, this time actually checking for overflow. LLVM sees through this, and eliminates the branch/selects https://godbolt.org/z/WsYvfjPas

cc @RalfJung this also removes uses of a few intrinsics from avx2

@rustbot
Copy link
Collaborator

rustbot commented Nov 10, 2025

r? @folkertdev

rustbot has assigned @folkertdev.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Comment on lines 2915 to 2927
pub fn _mm256_srav_epi32(a: __m256i, count: __m256i) -> __m256i {
unsafe { transmute(psravd256(a.as_i32x8(), count.as_i32x8())) }
unsafe {
let count = count.as_i32x8();
let overflow: i32x8 = simd_ge(count, i32x8::splat(32));
simd_select(overflow, i32x8::ZERO, simd_shr(a.as_i32x8(), count)).as_m256i()
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct, since this is an arithmetic shift and overflow should result in all-zeros or all-ones depending on the sign bit.

Copy link
Contributor Author

@sayantn sayantn Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for pointing it out, yes srav intrinsics should fill sign-bit on overflow. I would modify it

@eduardosm
Copy link
Contributor

simd_shl and simd_shr are documented to be UB on overflow, but the implementations of this PR will unconditionally call these intrinsics.

@sayantn
Copy link
Contributor Author

sayantn commented Nov 11, 2025

Ah, so I need to add a simd_and to make it safe. Thanks for pointing it out

@sayantn
Copy link
Contributor Author

sayantn commented Nov 12, 2025

It looks like adding that one simd_and messes up the whole LLVM optimization. Strictly speaking, this is fine as long as we are only in LLVM backend, LLVM generates poison values if the shift value overflows and we are filtering them out in the next step. But obviously that's not a good argument, as the main motive for this change is to make it more portable across backends. I will look into what needs to be done

edit: https://godbolt.org/z/99zrnb5Gj

edit2: somehow using simd_select(good, count, T::splat(dummy_value)) works. It lets LLVM collapse the 2 selects, while also keeping the simd_shl call safe from Rust (https://godbolt.org/z/cc4EKfeb7)

@sayantn
Copy link
Contributor Author

sayantn commented Nov 12, 2025

https://godbolt.org/z/cafKWsf6x I think these should do the job, @eduardosm could you check please?

edit: seems to pass miri on overflow playground

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants