Releases: ROCm/rocFFT
rocfft 1.0.34 for ROCm 7.1.0
rocFFT code for ROCm 7.1.0 did not change. The library was rebuilt for the updated ROCm 7.1.0 stack.
rocfft 1.0.34 for ROCm 7.0.2
rocFFT code for ROCm 7.0.2 did not change. The library was rebuilt for the updated ROCm 7.0.2 stack.
rocfft 1.0.34 for ROCm 7.0.1
rocFFT code for ROCm 7.0.1 did not change. The library was rebuilt for the updated ROCm 7.0.1 stack.
rocFFT 1.0.34 for ROCm 7.0.0
Added
- Added gfx950 support.
 
Removed
- Removed rocfft-rider legacy compatibility from clients
 - Removed support for the gfx940 and gfx941 targets from the client programs.
 - Removed backward compatibility symlink for include directories.
 
Optimized
- Removed unnecessary HIP event/stream allocation and synchronization during MPI transforms.
 - Implemented single-precision 1D kernels for lengths:
- 4704
 - 5488
 - 6144
 - 6561
 - 8192
 
 - Implemented single-kernel plans for some large 1D problem sizes, on devices with at least 160KiB of LDS.
 
Resolved issues
- Fixed kernel faults on multi-device transforms that gather to a single device, when the input/output bricks are not
contiguous. 
rocFFT 1.0.32 for ROCm 6.4.4
rocFFT code for ROCm 6.4.4 did not change. The library was rebuilt for the updated ROCm 6.4.4 stack.
rocFFT 1.0.32 for ROCm 6.4.3
rocFFT code for ROCm 6.4.3 did not change. The library was rebuilt for the updated ROCm 6.4.3 stack.
rocFFT 1.0.32 for ROCm 6.4.2
rocFFT code for ROCm 6.4.2 did not change. The library was rebuilt for the updated ROCm 6.4.2 stack.
rocFFT 1.0.32 for ROCm 6.4.1
rocFFT code for ROCm 6.4.1 did not change. The library was rebuilt for the updated ROCm 6.4.1 stack.
rocFFT 1.0.32 for ROCm 6.4.0
Changed
- Building with the address sanitizer option sets xnack+ on relevant GPU
architectures and adds address-sanitizer support to runtime-compiled
kernels. - The 
AMDGPU_TARGETSbuild variable should be replaced withGPU_TARGETS.AMDGPU_TARGETSis deprecated. 
Removed
- Removed ahead-of-time compiled kernels for the gfx906, gfx940, and gfx941 architectures. These architectures still
function the same, but kernels for them are now compiled at runtime. - Removed consumer GPU architectures from the precompiled kernel cache that ships with
rocFFT. rocFFT continues to ship with a cache of precompiled RTC kernels for data-center
and workstation architectures. As before, user-level caches can be enabled by setting the
environment variable ROCFFT_RTC_CACHE_PATH to a writeable file location. 
Optimized
- Improved MPI transform performance by using all-to-all communication for global transpose operations.
Point-to-point communications are still used when all-to-all is not possible. - Improved the performance of unit-strided, complex interleaved, forward and inverse, length (64,64,64) FFTs.
 
Resolved issues
- Fixed incorrect results from 2-kernel 3D FFT plans that used non-default output strides. For more information, see the rocFFT GitHub issue.
 - Plan descriptions can be reused with different strides for different plans. For more information, see the rocFFT GitHub issue.
 - Fixed client packages to depend on hipRAND instead of rocRAND.
 - Fixed potential integer overflows during large MPI transforms.
 
rocFFT 1.0.31 for ROCm 6.3.3
rocFFT code for ROCm 6.3.3 did not change. The library was rebuilt for the updated ROCm 6.3.3 stack.