⚡️ Speed up method LightGlueImageProcessor.post_process_keypoint_matching by 6%
#385
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 6% (0.06x) speedup for
LightGlueImageProcessor.post_process_keypoint_matchinginsrc/transformers/models/lightglue/image_processing_lightglue.py⏱️ Runtime :
2.95 milliseconds→2.78 milliseconds(best of48runs)📝 Explanation and details
The optimized code achieves a 6% speedup through several key tensor operation optimizations:
Key Optimizations:
Eliminated unnecessary
.clone(): The original code called.clone()onoutputs.keypoints, but this is redundant since the subsequent multiplication and.to(torch.int32)operations already create new tensors. This saves memory allocation and copying overhead.Precomputed batch slices: Instead of extracting slices inside the loop (e.g.,
outputs.mask,outputs.matches[:, 0]), the optimized version precomputes these outside the loop asmask0_all,mask1_all,matches_all, etc. This eliminates repeated attribute lookups and tensor slicing operations that were happening on every iteration.Replaced
torch.tensorwithtorch.as_tensor: For convertingtarget_sizesfrom list to tensor,torch.as_tensoravoids unnecessary memory copies when the input is already tensor-like, providing a minor but consistent performance gain.Added empty tensor handling: The optimization safely handles cases where
matched_indices.numel() == 0to avoid potential indexing errors with empty tensors, using.new_empty()to create appropriately shaped empty tensors.Performance Impact:
The line profiler shows the biggest gains come from:
.clone()operation (line withkeypoints = outputs.keypoints.clone())Test Case Analysis:
The optimizations show consistent improvements across all test scenarios:
.clone()and reduced attribute accessThis optimization is particularly valuable for computer vision pipelines processing multiple image pairs in batches, where the post-processing step is called frequently after model inference.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-LightGlueImageProcessor.post_process_keypoint_matching-mia6ei7cand push.