-
Notifications
You must be signed in to change notification settings - Fork 301
Use SIMD intrinsics for vector shifts #1955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
r? @folkertdev rustbot has assigned @folkertdev. Use |
| pub fn _mm256_srav_epi32(a: __m256i, count: __m256i) -> __m256i { | ||
| unsafe { transmute(psravd256(a.as_i32x8(), count.as_i32x8())) } | ||
| unsafe { | ||
| let count = count.as_i32x8(); | ||
| let overflow: i32x8 = simd_ge(count, i32x8::splat(32)); | ||
| simd_select(overflow, i32x8::ZERO, simd_shr(a.as_i32x8(), count)).as_m256i() | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct, since this is an arithmetic shift and overflow should result in all-zeros or all-ones depending on the sign bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for pointing it out, yes srav intrinsics should fill sign-bit on overflow. I would modify it
|
|
|
Ah, so I need to add a |
|
It looks like adding that one edit: https://godbolt.org/z/99zrnb5Gj edit2: somehow using |
|
https://godbolt.org/z/cafKWsf6x I think these should do the job, @eduardosm could you check please? edit: seems to pass miri on overflow playground |
8e0efaf to
67db9b3
Compare
67db9b3 to
a299d29
Compare
Retrying #1928, this time actually checking for overflow. LLVM sees through this, and eliminates the branch/selects https://godbolt.org/z/WsYvfjPas
cc @RalfJung this also removes uses of a few intrinsics from avx2