Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 21, 2025

📄 13% (0.13x) speedup for wcwidth in src/_pytest/_io/wcwidth.py

⏱️ Runtime : 1.34 milliseconds 1.19 milliseconds (best of 114 runs)

📝 Explanation and details

The optimization achieves a 12% speedup by replacing runtime range checks and tuple creation with precomputed sets and constants for faster membership testing.

Key Optimizations:

  1. Set-based lookups: The complex range comparisons (o == 0x0000 or 0x200B <= o <= 0x200F or ...) are replaced with a single set lookup o in _Cf_Zp_Zl_SET. Python's set membership testing uses hash tables, making it O(1) vs O(n) for multiple range checks.

  2. Precomputed category sets: String tuple comparisons like category in ("Me", "Mn") and unicodedata.east_asian_width(c) in ("F", "W") are replaced with precomputed sets _COMBINING_CATEGORIES and _EAWIDE, eliminating tuple allocation on each call.

Performance Impact:
The function is called in a hot path by wcswidth() which iterates over every character in strings for terminal width calculation. Test results show consistent improvements:

  • ASCII control characters: 10-12% faster (frequently tested cases)
  • East Asian wide characters: 14-16% faster (CJK text processing)
  • Large batches: 15-19% faster when processing multiple characters

Best Performance Cases:
The optimization particularly benefits workloads with:

  • Mixed character sets requiring multiple category checks
  • High-frequency calls to wcswidth() on text with non-ASCII characters
  • Batch processing of Unicode text where the LRU cache hit rate is low

The changes maintain identical behavior while leveraging Python's optimized set operations for faster character classification.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 2547 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 6 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
# imports
from _pytest._io.wcwidth import wcwidth
import pytest  # used for our unit tests


# unit tests

# ------------------- BASIC TEST CASES -------------------


def test_ascii_printable_characters():
    # All printable ASCII characters (0x20 to 0x7E) should return 1
    for code in range(0x20, 0x7F):
        c = chr(code)
        codeflash_output = wcwidth(c)  # 25.6μs -> 25.4μs (0.783% faster)


def test_ascii_control_characters():
    # ASCII control characters (0x00 to 0x1F and 0x7F) should return -1
    for code in list(range(0x20)) + [0x7F]:
        c = chr(code)
        codeflash_output = wcwidth(c)  # 14.6μs -> 13.3μs (10.0% faster)


def test_basic_combining_character():
    # Combining acute accent (U+0301) should be zero width
    codeflash_output = wcwidth("\u0301")  # 1.90μs -> 1.71μs (11.1% faster)


def test_basic_non_combining_non_ascii():
    # Latin-1 Supplement letter 'é' (U+00E9) is printable, width 1
    codeflash_output = wcwidth("é")  # 2.28μs -> 2.00μs (14.4% faster)


def test_basic_fullwidth_east_asian():
    # Fullwidth 'A' (U+FF21) should be width 2
    codeflash_output = wcwidth("A")  # 2.28μs -> 1.96μs (15.9% faster)


def test_basic_wide_east_asian():
    # CJK ideograph '中' (U+4E2D) should be width 2
    codeflash_output = wcwidth("中")  # 2.15μs -> 1.87μs (15.3% faster)


def test_basic_zero_width_space():
    # Zero width space (U+200B) should be width 0
    codeflash_output = wcwidth("\u200b")  # 1.02μs -> 1.08μs (5.48% slower)


def test_basic_non_printable_format_char():
    # Soft hyphen (U+00AD) is Cf but generally width 1 in terminals
    codeflash_output = wcwidth("\u00ad")  # 2.15μs -> 1.88μs (14.5% faster)


def test_basic_surrogate_pair_handling():
    # Emoji '😀' (U+1F600) is wide in some terminals, but Unicode east_asian_width='W'
    codeflash_output = wcwidth("😀")  # 2.23μs -> 1.98μs (13.1% faster)


# ------------------- EDGE TEST CASES -------------------


def test_null_character():
    # Null character (U+0000) should be width 0
    codeflash_output = wcwidth("\u0000")  # 791ns -> 891ns (11.2% slower)


def test_noncharacter_code_point():
    # U+FDD0 is a noncharacter, category 'Cn', should default to width 1
    codeflash_output = wcwidth("\ufdd0")  # 2.19μs -> 1.93μs (13.5% faster)


def test_private_use_area():
    # U+E000 is private use, category 'Co', should default to width 1
    codeflash_output = wcwidth("\ue000")  # 2.11μs -> 1.77μs (19.2% faster)


def test_unassigned_code_point():
    # U+0378 is unassigned, category 'Cn', should default to width 1
    codeflash_output = wcwidth("\u0378")  # 1.99μs -> 1.76μs (12.9% faster)


def test_zero_width_non_joiner():
    # U+200C is zero width non-joiner, should be width 0
    codeflash_output = wcwidth("\u200c")  # 928ns -> 1.05μs (11.9% slower)


def test_zero_width_joiner():
    # U+200D is zero width joiner, should be width 0
    codeflash_output = wcwidth("\u200d")  # 929ns -> 992ns (6.35% slower)


def test_line_separator():
    # U+2028 is a line separator, should be width 0
    codeflash_output = wcwidth("\u2028")  # 1.07μs -> 1.02μs (4.31% faster)


def test_paragraph_separator():
    # U+2029 is a paragraph separator, should be width 0
    codeflash_output = wcwidth("\u2029")  # 1.00μs -> 965ns (3.83% faster)


def test_format_control_characters():
    # U+2060 (WORD JOINER) and U+2063 (INVISIBLE SEPARATOR) should be width 0
    codeflash_output = wcwidth("\u2060")  # 1.05μs -> 1.04μs (0.383% faster)
    codeflash_output = wcwidth("\u2063")  # 635ns -> 497ns (27.8% faster)


def test_nonspacing_mark():
    # U+034F (COMBINING GRAPHEME JOINER) is Mn, should be width 0
    codeflash_output = wcwidth("\u034f")  # 1.80μs -> 1.79μs (0.951% faster)


def test_enclosing_mark():
    # U+20DD (COMBINING ENCLOSING CIRCLE) is Me, should be width 0
    codeflash_output = wcwidth("\u20dd")  # 1.70μs -> 1.65μs (3.52% faster)


def test_invalid_input_type():
    # Should raise TypeError if input is not a single character string
    with pytest.raises(TypeError):
        wcwidth(123)  # 1.39μs -> 1.35μs (3.03% faster)
    with pytest.raises(TypeError):
        wcwidth(["a"])  # 847ns -> 802ns (5.61% faster)
    with pytest.raises(TypeError):
        wcwidth("ab")  # 1.45μs -> 1.47μs (1.23% slower)


def test_high_plane_character():
    # U+1F4A9 (PILE OF POO) emoji, east_asian_width='W', should be width 2
    codeflash_output = wcwidth("\U0001f4a9")  # 2.42μs -> 2.30μs (5.26% faster)


def test_braille_pattern_blank():
    # U+2800 (BRAILLE PATTERN BLANK), category 'So', east_asian_width='N', should be width 1
    codeflash_output = wcwidth("\u2800")  # 2.19μs -> 1.93μs (13.1% faster)


def test_combining_double_breve_below():
    # U+035D (COMBINING DOUBLE BREVE BELOW), category 'Mn', should be width 0
    codeflash_output = wcwidth("\u035d")  # 1.72μs -> 1.56μs (10.3% faster)


def test_combining_enclosing_square():
    # U+20DE (COMBINING ENCLOSING SQUARE), category 'Me', should be width 0
    codeflash_output = wcwidth("\u20de")  # 1.77μs -> 1.64μs (7.88% faster)


def test_non_bmp_non_combining():
    # U+10400 (DESERET CAPITAL LETTER LONG I), category 'Lu', east_asian_width='N', should be width 1
    codeflash_output = wcwidth("\U00010400")  # 2.35μs -> 1.97μs (19.3% faster)


# ------------------- LARGE SCALE TEST CASES -------------------


def test_large_scale_random_unicode():
    # Test a mix of 1000 random code points from BMP and SMP
    import random

    chars = []
    for _ in range(1000):
        # Randomly pick a code point from BMP or SMP
        plane = random.choice([0, 1])
        if plane == 0:
            code = random.randint(0x0000, 0xFFFF)
        else:
            code = random.randint(0x10000, 0x10FFFF)
        try:
            c = chr(code)
            # Only test single code units (skip surrogates)
            if 0xD800 <= code <= 0xDFFF:
                continue
            chars.append(c)
        except ValueError:
            continue
    # All should return an integer in {-1, 0, 1, 2}
    for c in chars:
        codeflash_output = wcwidth(c)
        w = codeflash_output  # 541μs -> 477μs (13.4% faster)
import string

# function to test
import unicodedata

from _pytest._io.wcwidth import wcwidth

# imports
import pytest  # used for our unit tests


# unit tests

# --------------------------
# BASIC TEST CASES
# --------------------------


def test_ascii_printable_letters():
    # All printable ASCII letters and digits should have width 1
    for ch in string.ascii_letters + string.digits:
        codeflash_output = wcwidth(ch)  # 16.8μs -> 16.8μs (0.191% faster)


def test_ascii_printable_symbols():
    # All printable ASCII punctuation should have width 1
    for ch in string.punctuation:
        codeflash_output = wcwidth(ch)  # 8.74μs -> 8.70μs (0.402% faster)


def test_ascii_space():
    # Space character should have width 1
    codeflash_output = wcwidth(" ")  # 713ns -> 651ns (9.52% faster)


def test_ascii_control_characters():
    # Control characters (0x00-0x1F, 0x7F) should have width -1
    for code in list(range(0x20)) + [0x7F]:
        ch = chr(code)
        codeflash_output = wcwidth(ch)  # 14.7μs -> 13.2μs (11.8% faster)


def test_combining_acute_accent():
    # Combining acute accent U+0301 should have width 0
    codeflash_output = wcwidth("\u0301")  # 1.67μs -> 1.65μs (1.27% faster)


def test_combining_enclosing_circle():
    # Combining enclosing circle U+20DD should have width 0
    codeflash_output = wcwidth("\u20dd")  # 1.76μs -> 1.65μs (6.59% faster)


def test_fullwidth_east_asian():
    # Fullwidth A (U+FF21) should have width 2
    codeflash_output = wcwidth("\uff21")  # 2.18μs -> 2.03μs (7.53% faster)


def test_wide_east_asian():
    # CJK Ideograph (U+4E00) should have width 2
    codeflash_output = wcwidth("\u4e00")  # 2.12μs -> 1.85μs (14.7% faster)


def test_narrow_east_asian():
    # Katakana middle dot (U+30FB) is "W" (wide) and should have width 2
    codeflash_output = wcwidth("\u30fb")  # 1.94μs -> 1.70μs (14.6% faster)


def test_non_printable_format_characters():
    # Zero-width space (U+200B) should have width 0
    codeflash_output = wcwidth("\u200b")  # 1.02μs -> 1.11μs (8.13% slower)
    # Word joiner (U+2060) should have width 0
    codeflash_output = wcwidth("\u2060")  # 670ns -> 465ns (44.1% faster)


def test_printable_emoji():
    # Smiling face emoji (U+1F600) is "W" (wide) and should have width 2
    codeflash_output = wcwidth("\U0001f600")  # 2.29μs -> 2.01μs (13.9% faster)


def test_printable_non_ascii_narrow():
    # Latin-1 Supplement: 'é' (U+00E9) should have width 1
    codeflash_output = wcwidth("é")  # 2.09μs -> 2.00μs (4.44% faster)


# --------------------------
# EDGE TEST CASES
# --------------------------


def test_null_character():
    # NULL (U+0000) should have width 0
    codeflash_output = wcwidth("\x00")  # 783ns -> 820ns (4.51% slower)


def test_bidi_control_characters():
    # Left-to-right mark (U+200E) should have width 0
    codeflash_output = wcwidth("\u200e")  # 975ns -> 1.10μs (11.4% slower)
    # Right-to-left mark (U+200F) should have width 0
    codeflash_output = wcwidth("\u200f")  # 576ns -> 491ns (17.3% faster)


def test_line_separator():
    # Line separator (U+2028) should have width 0
    codeflash_output = wcwidth("\u2028")  # 1.02μs -> 956ns (7.01% faster)


def test_paragraph_separator():
    # Paragraph separator (U+2029) should have width 0
    codeflash_output = wcwidth("\u2029")  # 984ns -> 985ns (0.102% slower)


def test_surrogate_code_points():
    # Surrogates (U+D800 to U+DFFF) are not valid Unicode scalar values, but Python allows them in chr
    # They are category 'Cs' (surrogate), not 'Cc', 'Me', or 'Mn', and not in any zero-width block
    # Should return 1 as per the fallback
    for code in range(0xD800, 0xDFFF + 1, 256):  # Sample a few surrogates
        ch = chr(code)
        codeflash_output = wcwidth(ch)  # 6.93μs -> 5.67μs (22.2% faster)


def test_private_use_area():
    # Private Use Area (U+E000) should have width 1
    codeflash_output = wcwidth("\ue000")  # 1.98μs -> 1.68μs (17.8% faster)


def test_noncharacter_code_point():
    # U+FDD0 is a noncharacter, but not a control, so should fallback to width 1
    codeflash_output = wcwidth("\ufdd0")  # 2.08μs -> 1.83μs (13.4% faster)


def test_unassigned_code_point():
    # U+2FFFF is unassigned, but should fallback to width 1
    codeflash_output = wcwidth("\U0002ffff")  # 2.17μs -> 1.88μs (15.4% faster)


def test_invalid_input_type():
    # Should raise TypeError if input is not a single character string
    with pytest.raises(TypeError):
        wcwidth(123)  # 1.39μs -> 1.43μs (2.66% slower)
    with pytest.raises(TypeError):
        wcwidth(None)  # 855ns -> 872ns (1.95% slower)
    with pytest.raises(TypeError):
        wcwidth("ab")  # 1.43μs -> 1.33μs (8.07% faster)


def test_non_bmp_combining_character():
    # U+1D165 is a musical symbol combining character (category 'Mn'), should have width 0
    codeflash_output = wcwidth("\U0001d165")  # 2.22μs -> 1.93μs (14.8% faster)


def test_zero_width_non_joiner_and_joiner():
    # U+200C (ZWNJ) and U+200D (ZWJ) should have width 0
    codeflash_output = wcwidth("\u200c")  # 994ns -> 1.06μs (6.14% slower)
    codeflash_output = wcwidth("\u200d")  # 467ns -> 429ns (8.86% faster)


def test_lone_high_surrogate():
    # Lone high surrogate (U+D800) should return 1
    codeflash_output = wcwidth("\ud800")  # 2.14μs -> 1.86μs (14.9% faster)


def test_lone_low_surrogate():
    # Lone low surrogate (U+DC00) should return 1
    codeflash_output = wcwidth("\udc00")  # 1.92μs -> 1.67μs (14.8% faster)


def test_braille_pattern():
    # Braille pattern dots-1 (U+2801) is not wide, should have width 1
    codeflash_output = wcwidth("\u2801")  # 1.96μs -> 1.82μs (7.79% faster)


def test_noncharacter_fffe_ffff():
    # U+FFFE and U+FFFF are noncharacters, but should fallback to width 1
    codeflash_output = wcwidth("\ufffe")  # 1.97μs -> 1.76μs (12.2% faster)
    codeflash_output = wcwidth("\uffff")  # 928ns -> 766ns (21.1% faster)


# --------------------------
# LARGE SCALE TEST CASES
# --------------------------


def test_large_ascii_batch():
    # Test all ASCII characters in one go (0x00-0x7F)
    for code in range(0x80):
        ch = chr(code)
        if 0x20 <= code < 0x7F:
            expected = 1
        elif code == 0x00:
            expected = 0
        else:
            expected = -1
        codeflash_output = wcwidth(ch)  # 40.0μs -> 37.3μs (7.02% faster)


def test_large_combining_batch():
    # Test a batch of 100 combining marks (category Mn/Me)
    count = 0
    for code in range(0x0300, 0x036F + 1):
        ch = chr(code)
        if unicodedata.category(ch) in ("Mn", "Me"):
            codeflash_output = wcwidth(ch)
            count += 1


def test_large_east_asian_wide_batch():
    # Test 100 wide CJK characters (U+4E00 to U+4E64)
    for code in range(0x4E00, 0x4E64):
        ch = chr(code)
        codeflash_output = wcwidth(ch)  # 55.7μs -> 47.2μs (17.9% faster)


def test_large_emoji_batch():
    # Test 100 emoji codepoints (U+1F600 to U+1F663)
    for code in range(0x1F600, 0x1F600 + 100):
        ch = chr(code)
        codeflash_output = wcwidth(ch)
        width = codeflash_output  # 55.4μs -> 48.0μs (15.4% faster)
        # Most emoji are wide, but not all; check against east_asian_width
        expected = 2 if unicodedata.east_asian_width(ch) in ("F", "W") else 1


def test_large_private_use_batch():
    # Test 100 private use characters (U+E000 to U+E063)
    for code in range(0xE000, 0xE000 + 100):
        ch = chr(code)
        codeflash_output = wcwidth(ch)  # 56.2μs -> 47.1μs (19.2% faster)


def test_large_random_sample():
    # Test a random sample of 500 codepoints across the BMP
    import random

    random.seed(42)
    codes = random.sample(range(0xFFFF), 500)
    for code in codes:
        ch = chr(code)
        try:
            codeflash_output = wcwidth(ch)
            width = codeflash_output
        except Exception as e:
            # Only TypeError is allowed for invalid input
            raise AssertionError(f"U+{code:04X} raised {type(e)}: {e}")
from _pytest._io.wcwidth import wcwidth


def test_wcwidth():
    wcwidth("⁰")


def test_wcwidth_2():
    wcwidth("\t")


def test_wcwidth_3():
    wcwidth(" ")


def test_wcwidth_4():
    wcwidth("\x00")


def test_wcwidth_5():
    wcwidth("\u2065")


def test_wcwidth_6():
    wcwidth("\u2060")
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic__lsdxkww/tmpa3lxvdfa/test_concolic_coverage.py::test_wcwidth 2.10μs 1.80μs 16.3%✅
codeflash_concolic__lsdxkww/tmpa3lxvdfa/test_concolic_coverage.py::test_wcwidth_2 1.42μs 1.31μs 7.84%✅
codeflash_concolic__lsdxkww/tmpa3lxvdfa/test_concolic_coverage.py::test_wcwidth_3 751ns 690ns 8.84%✅
codeflash_concolic__lsdxkww/tmpa3lxvdfa/test_concolic_coverage.py::test_wcwidth_4 780ns 909ns -14.2%⚠️
codeflash_concolic__lsdxkww/tmpa3lxvdfa/test_concolic_coverage.py::test_wcwidth_5 2.39μs 2.02μs 18.0%✅
codeflash_concolic__lsdxkww/tmpa3lxvdfa/test_concolic_coverage.py::test_wcwidth_6 1.07μs 1.10μs -2.28%⚠️

To edit these changes git checkout codeflash/optimize-wcwidth-mi9iu0mg and push.

Codeflash Static Badge

The optimization achieves a **12% speedup** by replacing runtime range checks and tuple creation with precomputed sets and constants for faster membership testing.

**Key Optimizations:**

1. **Set-based lookups**: The complex range comparisons `(o == 0x0000 or 0x200B <= o <= 0x200F or ...)` are replaced with a single set lookup `o in _Cf_Zp_Zl_SET`. Python's set membership testing uses hash tables, making it O(1) vs O(n) for multiple range checks.

2. **Precomputed category sets**: String tuple comparisons like `category in ("Me", "Mn")` and `unicodedata.east_asian_width(c) in ("F", "W")` are replaced with precomputed sets `_COMBINING_CATEGORIES` and `_EAWIDE`, eliminating tuple allocation on each call.

**Performance Impact:**
The function is called in a hot path by `wcswidth()` which iterates over every character in strings for terminal width calculation. Test results show consistent improvements:
- **ASCII control characters**: 10-12% faster (frequently tested cases)
- **East Asian wide characters**: 14-16% faster (CJK text processing)
- **Large batches**: 15-19% faster when processing multiple characters

**Best Performance Cases:**
The optimization particularly benefits workloads with:
- Mixed character sets requiring multiple category checks
- High-frequency calls to `wcswidth()` on text with non-ASCII characters
- Batch processing of Unicode text where the LRU cache hit rate is low

The changes maintain identical behavior while leveraging Python's optimized set operations for faster character classification.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 21, 2025 23:58
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant