⚡️ Speed up function `wcwidth` by 13% #68

codeflash-ai · 2025-11-21T23:58:46Z

📄 13% (0.13x) speedup for `wcwidth` in `src/_pytest/_io/wcwidth.py`

⏱️ Runtime : 1.34 milliseconds → 1.19 milliseconds (best of 114 runs)

📝 Explanation and details

The optimization achieves a 12% speedup by replacing runtime range checks and tuple creation with precomputed sets and constants for faster membership testing.

Key Optimizations:

Set-based lookups: The complex range comparisons (o == 0x0000 or 0x200B <= o <= 0x200F or ...) are replaced with a single set lookup o in _Cf_Zp_Zl_SET. Python's set membership testing uses hash tables, making it O(1) vs O(n) for multiple range checks.
Precomputed category sets: String tuple comparisons like category in ("Me", "Mn") and unicodedata.east_asian_width(c) in ("F", "W") are replaced with precomputed sets _COMBINING_CATEGORIES and _EAWIDE, eliminating tuple allocation on each call.

Performance Impact:
The function is called in a hot path by wcswidth() which iterates over every character in strings for terminal width calculation. Test results show consistent improvements:

ASCII control characters: 10-12% faster (frequently tested cases)
East Asian wide characters: 14-16% faster (CJK text processing)
Large batches: 15-19% faster when processing multiple characters

Best Performance Cases:
The optimization particularly benefits workloads with:

Mixed character sets requiring multiple category checks
High-frequency calls to wcswidth() on text with non-ASCII characters
Batch processing of Unicode text where the LRU cache hit rate is low

The changes maintain identical behavior while leveraging Python's optimized set operations for faster character classification.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 2547 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	✅ 6 Passed
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

# imports
from _pytest._io.wcwidth import wcwidth
import pytest  # used for our unit tests


# unit tests

# ------------------- BASIC TEST CASES -------------------


def test_ascii_printable_characters():
    # All printable ASCII characters (0x20 to 0x7E) should return 1
    for code in range(0x20, 0x7F):
        c = chr(code)
        codeflash_output = wcwidth(c)  # 25.6μs -> 25.4μs (0.783% faster)


def test_ascii_control_characters():
    # ASCII control characters (0x00 to 0x1F and 0x7F) should return -1
    for code in list(range(0x20)) + [0x7F]:
        c = chr(code)
        codeflash_output = wcwidth(c)  # 14.6μs -> 13.3μs (10.0% faster)


def test_basic_combining_character():
    # Combining acute accent (U+0301) should be zero width
    codeflash_output = wcwidth("\u0301")  # 1.90μs -> 1.71μs (11.1% faster)


def test_basic_non_combining_non_ascii():
    # Latin-1 Supplement letter 'é' (U+00E9) is printable, width 1
    codeflash_output = wcwidth("é")  # 2.28μs -> 2.00μs (14.4% faster)


def test_basic_fullwidth_east_asian():
    # Fullwidth 'Ａ' (U+FF21) should be width 2
    codeflash_output = wcwidth("Ａ")  # 2.28μs -> 1.96μs (15.9% faster)


def test_basic_wide_east_asian():
    # CJK ideograph '中' (U+4E2D) should be width 2
    codeflash_output = wcwidth("中")  # 2.15μs -> 1.87μs (15.3% faster)


def test_basic_zero_width_space():
    # Zero width space (U+200B) should be width 0
    codeflash_output = wcwidth("\u200b")  # 1.02μs -> 1.08μs (5.48% slower)


def test_basic_non_printable_format_char():
    # Soft hyphen (U+00AD) is Cf but generally width 1 in terminals
    codeflash_output = wcwidth("\u00ad")  # 2.15μs -> 1.88μs (14.5% faster)


def test_basic_surrogate_pair_handling():
    # Emoji '😀' (U+1F600) is wide in some terminals, but Unicode east_asian_width='W'
    codeflash_output = wcwidth("😀")  # 2.23μs -> 1.98μs (13.1% faster)


# ------------------- EDGE TEST CASES -------------------


def test_null_character():
    # Null character (U+0000) should be width 0
    codeflash_output = wcwidth("\u0000")  # 791ns -> 891ns (11.2% slower)


def test_noncharacter_code_point():
    # U+FDD0 is a noncharacter, category 'Cn', should default to width 1
    codeflash_output = wcwidth("\ufdd0")  # 2.19μs -> 1.93μs (13.5% faster)


def test_private_use_area():
    # U+E000 is private use, category 'Co', should default to width 1
    codeflash_output = wcwidth("\ue000")  # 2.11μs -> 1.77μs (19.2% faster)


def test_unassigned_code_point():
    # U+0378 is unassigned, category 'Cn', should default to width 1
    codeflash_output = wcwidth("\u0378")  # 1.99μs -> 1.76μs (12.9% faster)


def test_zero_width_non_joiner():
    # U+200C is zero width non-joiner, should be width 0
    codeflash_output = wcwidth("\u200c")  # 928ns -> 1.05μs (11.9% slower)


def test_zero_width_joiner():
    # U+200D is zero width joiner, should be width 0
    codeflash_output = wcwidth("\u200d")  # 929ns -> 992ns (6.35% slower)


def test_line_separator():
    # U+2028 is a line separator, should be width 0
    codeflash_output = wcwidth("\u2028")  # 1.07μs -> 1.02μs (4.31% faster)


def test_paragraph_separator():
    # U+2029 is a paragraph separator, should be width 0
    codeflash_output = wcwidth("\u2029")  # 1.00μs -> 965ns (3.83% faster)


def test_format_control_characters():
    # U+2060 (WORD JOINER) and U+2063 (INVISIBLE SEPARATOR) should be width 0
    codeflash_output = wcwidth("\u2060")  # 1.05μs -> 1.04μs (0.383% faster)
    codeflash_output = wcwidth("\u2063")  # 635ns -> 497ns (27.8% faster)


def test_nonspacing_mark():
    # U+034F (COMBINING GRAPHEME JOINER) is Mn, should be width 0
    codeflash_output = wcwidth("\u034f")  # 1.80μs -> 1.79μs (0.951% faster)


def test_enclosing_mark():
    # U+20DD (COMBINING ENCLOSING CIRCLE) is Me, should be width 0
    codeflash_output = wcwidth("\u20dd")  # 1.70μs -> 1.65μs (3.52% faster)


def test_invalid_input_type():
    # Should raise TypeError if input is not a single character string
    with pytest.raises(TypeError):
        wcwidth(123)  # 1.39μs -> 1.35μs (3.03% faster)
    with pytest.raises(TypeError):
        wcwidth(["a"])  # 847ns -> 802ns (5.61% faster)
    with pytest.raises(TypeError):
        wcwidth("ab")  # 1.45μs -> 1.47μs (1.23% slower)


def test_high_plane_character():
    # U+1F4A9 (PILE OF POO) emoji, east_asian_width='W', should be width 2
    codeflash_output = wcwidth("\U0001f4a9")  # 2.42μs -> 2.30μs (5.26% faster)


def test_braille_pattern_blank():
    # U+2800 (BRAILLE PATTERN BLANK), category 'So', east_asian_width='N', should be width 1
    codeflash_output = wcwidth("\u2800")  # 2.19μs -> 1.93μs (13.1% faster)


def test_combining_double_breve_below():
    # U+035D (COMBINING DOUBLE BREVE BELOW), category 'Mn', should be width 0
    codeflash_output = wcwidth("\u035d")  # 1.72μs -> 1.56μs (10.3% faster)


def test_combining_enclosing_square():
    # U+20DE (COMBINING ENCLOSING SQUARE), category 'Me', should be width 0
    codeflash_output = wcwidth("\u20de")  # 1.77μs -> 1.64μs (7.88% faster)


def test_non_bmp_non_combining():
    # U+10400 (DESERET CAPITAL LETTER LONG I), category 'Lu', east_asian_width='N', should be width 1
    codeflash_output = wcwidth("\U00010400")  # 2.35μs -> 1.97μs (19.3% faster)


# ------------------- LARGE SCALE TEST CASES -------------------


def test_large_scale_random_unicode():
    # Test a mix of 1000 random code points from BMP and SMP
    import random

    chars = []
    for _ in range(1000):
        # Randomly pick a code point from BMP or SMP
        plane = random.choice([0, 1])
        if plane == 0:
            code = random.randint(0x0000, 0xFFFF)
        else:
            code = random.randint(0x10000, 0x10FFFF)
        try:
            c = chr(code)
            # Only test single code units (skip surrogates)
            if 0xD800 <= code <= 0xDFFF:
                continue
            chars.append(c)
        except ValueError:
            continue
    # All should return an integer in {-1, 0, 1, 2}
    for c in chars:
        codeflash_output = wcwidth(c)
        w = codeflash_output  # 541μs -> 477μs (13.4% faster)

import string

# function to test
import unicodedata

from _pytest._io.wcwidth import wcwidth

# imports
import pytest  # used for our unit tests


# unit tests

# --------------------------
# BASIC TEST CASES
# --------------------------


def test_ascii_printable_letters():
    # All printable ASCII letters and digits should have width 1
    for ch in string.ascii_letters + string.digits:
        codeflash_output = wcwidth(ch)  # 16.8μs -> 16.8μs (0.191% faster)


def test_ascii_printable_symbols():
    # All printable ASCII punctuation should have width 1
    for ch in string.punctuation:
        codeflash_output = wcwidth(ch)  # 8.74μs -> 8.70μs (0.402% faster)


def test_ascii_space():
    # Space character should have width 1
    codeflash_output = wcwidth(" ")  # 713ns -> 651ns (9.52% faster)


def test_ascii_control_characters():
    # Control characters (0x00-0x1F, 0x7F) should have width -1
    for code in list(range(0x20)) + [0x7F]:
        ch = chr(code)
        codeflash_output = wcwidth(ch)  # 14.7μs -> 13.2μs (11.8% faster)


def test_combining_acute_accent():
    # Combining acute accent U+0301 should have width 0
    codeflash_output = wcwidth("\u0301")  # 1.67μs -> 1.65μs (1.27% faster)


def test_combining_enclosing_circle():
    # Combining enclosing circle U+20DD should have width 0
    codeflash_output = wcwidth("\u20dd")  # 1.76μs -> 1.65μs (6.59% faster)


def test_fullwidth_east_asian():
    # Fullwidth A (U+FF21) should have width 2
    codeflash_output = wcwidth("\uff21")  # 2.18μs -> 2.03μs (7.53% faster)


def test_wide_east_asian():
    # CJK Ideograph (U+4E00) should have width 2
    codeflash_output = wcwidth("\u4e00")  # 2.12μs -> 1.85μs (14.7% faster)


def test_narrow_east_asian():
    # Katakana middle dot (U+30FB) is "W" (wide) and should have width 2
    codeflash_output = wcwidth("\u30fb")  # 1.94μs -> 1.70μs (14.6% faster)


def test_non_printable_format_characters():
    # Zero-width space (U+200B) should have width 0
    codeflash_output = wcwidth("\u200b")  # 1.02μs -> 1.11μs (8.13% slower)
    # Word joiner (U+2060) should have width 0
    codeflash_output = wcwidth("\u2060")  # 670ns -> 465ns (44.1% faster)


def test_printable_emoji():
    # Smiling face emoji (U+1F600) is "W" (wide) and should have width 2
    codeflash_output = wcwidth("\U0001f600")  # 2.29μs -> 2.01μs (13.9% faster)


def test_printable_non_ascii_narrow():
    # Latin-1 Supplement: 'é' (U+00E9) should have width 1
    codeflash_output = wcwidth("é")  # 2.09μs -> 2.00μs (4.44% faster)


# --------------------------
# EDGE TEST CASES
# --------------------------


def test_null_character():
    # NULL (U+0000) should have width 0
    codeflash_output = wcwidth("\x00")  # 783ns -> 820ns (4.51% slower)


def test_bidi_control_characters():
    # Left-to-right mark (U+200E) should have width 0
    codeflash_output = wcwidth("\u200e")  # 975ns -> 1.10μs (11.4% slower)
    # Right-to-left mark (U+200F) should have width 0
    codeflash_output = wcwidth("\u200f")  # 576ns -> 491ns (17.3% faster)


def test_line_separator():
    # Line separator (U+2028) should have width 0
    codeflash_output = wcwidth("\u2028")  # 1.02μs -> 956ns (7.01% faster)


def test_paragraph_separator():
    # Paragraph separator (U+2029) should have width 0
    codeflash_output = wcwidth("\u2029")  # 984ns -> 985ns (0.102% slower)


def test_surrogate_code_points():
    # Surrogates (U+D800 to U+DFFF) are not valid Unicode scalar values, but Python allows them in chr
    # They are category 'Cs' (surrogate), not 'Cc', 'Me', or 'Mn', and not in any zero-width block
    # Should return 1 as per the fallback
    for code in range(0xD800, 0xDFFF + 1, 256):  # Sample a few surrogates
        ch = chr(code)
        codeflash_output = wcwidth(ch)  # 6.93μs -> 5.67μs (22.2% faster)


def test_private_use_area():
    # Private Use Area (U+E000) should have width 1
    codeflash_output = wcwidth("\ue000")  # 1.98μs -> 1.68μs (17.8% faster)


def test_noncharacter_code_point():
    # U+FDD0 is a noncharacter, but not a control, so should fallback to width 1
    codeflash_output = wcwidth("\ufdd0")  # 2.08μs -> 1.83μs (13.4% faster)


def test_unassigned_code_point():
    # U+2FFFF is unassigned, but should fallback to width 1
    codeflash_output = wcwidth("\U0002ffff")  # 2.17μs -> 1.88μs (15.4% faster)


def test_invalid_input_type():
    # Should raise TypeError if input is not a single character string
    with pytest.raises(TypeError):
        wcwidth(123)  # 1.39μs -> 1.43μs (2.66% slower)
    with pytest.raises(TypeError):
        wcwidth(None)  # 855ns -> 872ns (1.95% slower)
    with pytest.raises(TypeError):
        wcwidth("ab")  # 1.43μs -> 1.33μs (8.07% faster)


def test_non_bmp_combining_character():
    # U+1D165 is a musical symbol combining character (category 'Mn'), should have width 0
    codeflash_output = wcwidth("\U0001d165")  # 2.22μs -> 1.93μs (14.8% faster)


def test_zero_width_non_joiner_and_joiner():
    # U+200C (ZWNJ) and U+200D (ZWJ) should have width 0
    codeflash_output = wcwidth("\u200c")  # 994ns -> 1.06μs (6.14% slower)
    codeflash_output = wcwidth("\u200d")  # 467ns -> 429ns (8.86% faster)


def test_lone_high_surrogate():
    # Lone high surrogate (U+D800) should return 1
    codeflash_output = wcwidth("\ud800")  # 2.14μs -> 1.86μs (14.9% faster)


def test_lone_low_surrogate():
    # Lone low surrogate (U+DC00) should return 1
    codeflash_output = wcwidth("\udc00")  # 1.92μs -> 1.67μs (14.8% faster)


def test_braille_pattern():
    # Braille pattern dots-1 (U+2801) is not wide, should have width 1
    codeflash_output = wcwidth("\u2801")  # 1.96μs -> 1.82μs (7.79% faster)


def test_noncharacter_fffe_ffff():
    # U+FFFE and U+FFFF are noncharacters, but should fallback to width 1
    codeflash_output = wcwidth("\ufffe")  # 1.97μs -> 1.76μs (12.2% faster)
    codeflash_output = wcwidth("\uffff")  # 928ns -> 766ns (21.1% faster)


# --------------------------
# LARGE SCALE TEST CASES
# --------------------------


def test_large_ascii_batch():
    # Test all ASCII characters in one go (0x00-0x7F)
    for code in range(0x80):
        ch = chr(code)
        if 0x20 <= code < 0x7F:
            expected = 1
        elif code == 0x00:
            expected = 0
        else:
            expected = -1
        codeflash_output = wcwidth(ch)  # 40.0μs -> 37.3μs (7.02% faster)


def test_large_combining_batch():
    # Test a batch of 100 combining marks (category Mn/Me)
    count = 0
    for code in range(0x0300, 0x036F + 1):
        ch = chr(code)
        if unicodedata.category(ch) in ("Mn", "Me"):
            codeflash_output = wcwidth(ch)
            count += 1


def test_large_east_asian_wide_batch():
    # Test 100 wide CJK characters (U+4E00 to U+4E64)
    for code in range(0x4E00, 0x4E64):
        ch = chr(code)
        codeflash_output = wcwidth(ch)  # 55.7μs -> 47.2μs (17.9% faster)


def test_large_emoji_batch():
    # Test 100 emoji codepoints (U+1F600 to U+1F663)
    for code in range(0x1F600, 0x1F600 + 100):
        ch = chr(code)
        codeflash_output = wcwidth(ch)
        width = codeflash_output  # 55.4μs -> 48.0μs (15.4% faster)
        # Most emoji are wide, but not all; check against east_asian_width
        expected = 2 if unicodedata.east_asian_width(ch) in ("F", "W") else 1


def test_large_private_use_batch():
    # Test 100 private use characters (U+E000 to U+E063)
    for code in range(0xE000, 0xE000 + 100):
        ch = chr(code)
        codeflash_output = wcwidth(ch)  # 56.2μs -> 47.1μs (19.2% faster)


def test_large_random_sample():
    # Test a random sample of 500 codepoints across the BMP
    import random

    random.seed(42)
    codes = random.sample(range(0xFFFF), 500)
    for code in codes:
        ch = chr(code)
        try:
            codeflash_output = wcwidth(ch)
            width = codeflash_output
        except Exception as e:
            # Only TypeError is allowed for invalid input
            raise AssertionError(f"U+{code:04X} raised {type(e)}: {e}")

from _pytest._io.wcwidth import wcwidth


def test_wcwidth():
    wcwidth("⁰")


def test_wcwidth_2():
    wcwidth("\t")


def test_wcwidth_3():
    wcwidth(" ")


def test_wcwidth_4():
    wcwidth("\x00")


def test_wcwidth_5():
    wcwidth("\u2065")


def test_wcwidth_6():
    wcwidth("\u2060")

🔎 Concolic Coverage Tests and Runtime

Test File::Test Function	Original ⏱️	Optimized ⏱️	Speedup
`codeflash_concolic__lsdxkww/tmpa3lxvdfa/test_concolic_coverage.py::test_wcwidth`	2.10μs	1.80μs	16.3%✅
`codeflash_concolic__lsdxkww/tmpa3lxvdfa/test_concolic_coverage.py::test_wcwidth_2`	1.42μs	1.31μs	7.84%✅
`codeflash_concolic__lsdxkww/tmpa3lxvdfa/test_concolic_coverage.py::test_wcwidth_3`	751ns	690ns	8.84%✅
`codeflash_concolic__lsdxkww/tmpa3lxvdfa/test_concolic_coverage.py::test_wcwidth_4`	780ns	909ns	-14.2%⚠️
`codeflash_concolic__lsdxkww/tmpa3lxvdfa/test_concolic_coverage.py::test_wcwidth_5`	2.39μs	2.02μs	18.0%✅
`codeflash_concolic__lsdxkww/tmpa3lxvdfa/test_concolic_coverage.py::test_wcwidth_6`	1.07μs	1.10μs	-2.28%⚠️

To edit these changes git checkout codeflash/optimize-wcwidth-mi9iu0mg and push.

The optimization achieves a **12% speedup** by replacing runtime range checks and tuple creation with precomputed sets and constants for faster membership testing. **Key Optimizations:** 1. **Set-based lookups**: The complex range comparisons `(o == 0x0000 or 0x200B <= o <= 0x200F or ...)` are replaced with a single set lookup `o in _Cf_Zp_Zl_SET`. Python's set membership testing uses hash tables, making it O(1) vs O(n) for multiple range checks. 2. **Precomputed category sets**: String tuple comparisons like `category in ("Me", "Mn")` and `unicodedata.east_asian_width(c) in ("F", "W")` are replaced with precomputed sets `_COMBINING_CATEGORIES` and `_EAWIDE`, eliminating tuple allocation on each call. **Performance Impact:** The function is called in a hot path by `wcswidth()` which iterates over every character in strings for terminal width calculation. Test results show consistent improvements: - **ASCII control characters**: 10-12% faster (frequently tested cases) - **East Asian wide characters**: 14-16% faster (CJK text processing) - **Large batches**: 15-19% faster when processing multiple characters **Best Performance Cases:** The optimization particularly benefits workloads with: - Mixed character sets requiring multiple category checks - High-frequency calls to `wcswidth()` on text with non-ASCII characters - Batch processing of Unicode text where the LRU cache hit rate is low The changes maintain identical behavior while leveraging Python's optimized set operations for faster character classification.

codeflash-ai bot requested a review from mashraf-222 November 21, 2025 23:58

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function `wcwidth` by 13% #68

⚡️ Speed up function `wcwidth` by 13% #68

Uh oh!

codeflash-ai bot commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function wcwidth by 13% #68

Are you sure you want to change the base?

⚡️ Speed up function wcwidth by 13% #68

Uh oh!

Conversation

codeflash-ai bot commented Nov 21, 2025

📄 13% (0.13x) speedup for wcwidth in src/_pytest/_io/wcwidth.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function `wcwidth` by 13% #68

⚡️ Speed up function `wcwidth` by 13% #68

📄 13% (0.13x) speedup for `wcwidth` in `src/_pytest/_io/wcwidth.py`