⚡️ Speed up method EnglishNormalizer.collapse_whitespace by 114%
#388
+4
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 114% (1.14x) speedup for
EnglishNormalizer.collapse_whitespaceinsrc/transformers/models/clvp/number_normalizer.py⏱️ Runtime :
1.90 milliseconds→886 microseconds(best of206runs)📝 Explanation and details
The optimization achieves a 114% speedup by eliminating redundant regex compilation in the
collapse_whitespacemethod.Key Changes:
r"\s+"is now compiled once during__init__and stored asself._whitespace_re, instead of being recompiled on every method call.import regex as reline was removed since only the standardremodule is used.Why This Works:
re.compile(r"\s+")was called every timecollapse_whitespacewas invoked, which is expensive (65,318ns per hit vs 18,744ns per hit in the optimized version).Performance Impact:
The optimization shows consistent 3-6x speedups across all test cases, with particularly strong gains for:
Context Benefits:
Since
EnglishNormalizer.__call__invokescollapse_whitespaceas part of a text processing pipeline, this optimization will compound performance gains for any text normalization workload, especially when processing multiple documents or operating in batch scenarios where the normalizer instance is reused.✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-EnglishNormalizer.collapse_whitespace-mia8jzgfand push.