Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 22, 2025

📄 17% (0.17x) speedup for _NodeReporter._prepare_content in src/_pytest/junitxml.py

⏱️ Runtime : 50.6 microseconds 43.3 microseconds (best of 98 runs)

📝 Explanation and details

The optimization replaces a "\n".join() call with direct string concatenation using the + operator.

Key Change:

  • Original: return "\n".join([header.center(80, "-"), content, ""])
  • Optimized: return header.center(80, "-") + "\n" + content + "\n"

Why This is Faster:
The original code creates a temporary list [header.center(80, "-"), content, ""] and then calls str.join() on it. This involves:

  1. List allocation and population (3 elements)
  2. Method call overhead for join()
  3. Internal iteration through the list elements

The optimized version eliminates the intermediate list creation and uses direct string concatenation, which is more efficient for a small, fixed number of strings. Python's string concatenation with + is optimized for simple cases like this.

Performance Impact:
The optimization shows a consistent 16% speedup overall, with particularly strong improvements (20-50%+) on simpler test cases with shorter strings. The line profiler shows the per-hit time improved from 1033.7ns to 916.2ns (11% per-call improvement).

Test Case Performance:

  • Best improvements on simple cases: empty content/headers (26-27% faster)
  • Good improvements on typical use cases: basic content (5-17% faster)
  • Minimal impact on very large content: some large test cases show slight slowdowns (1-9%), likely due to string concatenation behavior with very large strings

This optimization is particularly effective for the typical use case of formatting test report headers with moderate-sized content, which appears to be the primary purpose of this _prepare_content method in pytest's JUnit XML reporting.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 88 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import os
from typing import Dict
from typing import List
from typing import Optional
from typing import Tuple
from typing import Union

from _pytest.junitxml import _NodeReporter

# imports
import pytest


class LogXML:
    def __init__(
        self,
        logfile,
        prefix: Optional[str],
        suite_name: str = "pytest",
        logging: str = "no",
        report_duration: str = "total",
        family="xunit1",
        log_passing_tests: bool = True,
    ) -> None:
        logfile = os.path.expanduser(os.path.expandvars(logfile))
        self.logfile = os.path.normpath(os.path.abspath(logfile))
        self.prefix = prefix
        self.suite_name = suite_name
        self.logging = logging
        self.log_passing_tests = log_passing_tests
        self.report_duration = report_duration
        self.family = family
        self.stats: Dict[str, int] = dict.fromkeys(
            ["error", "passed", "failure", "skipped"], 0
        )
        self.node_reporters: Dict[Tuple[Union[str, object], object], _NodeReporter] = {}
        self.node_reporters_ordered: List[_NodeReporter] = []
        self.global_properties: List[Tuple[str, str]] = []
        self.open_reports: List[object] = []
        self.cnt_double_fail_tests = 0
        if self.family == "legacy":
            self.family = "xunit1"

    def add_stats(self, key: str) -> None:
        if key in self.stats:
            self.stats[key] += 1


# unit tests


@pytest.fixture
def node_reporter():
    # Provide a minimal LogXML for _NodeReporter
    logxml = LogXML(logfile="tmp.xml", prefix=None)
    return _NodeReporter("dummyid", logxml)


# 1. Basic Test Cases


def test_basic_content_and_header(node_reporter):
    # Basic scenario: short content and header
    content = "This is a test."
    header = "HEADER"
    codeflash_output = node_reporter._prepare_content(content, header)
    result = codeflash_output  # 852ns -> 814ns (4.67% faster)
    # The header should be centered to 80 chars with '-'
    expected_header = "HEADER".center(80, "-")


def test_empty_content(node_reporter):
    # Content is empty, header is normal
    content = ""
    header = "MyHeader"
    codeflash_output = node_reporter._prepare_content(content, header)
    result = codeflash_output  # 848ns -> 673ns (26.0% faster)
    expected_header = "MyHeader".center(80, "-")


def test_empty_header(node_reporter):
    # Header is empty, content is normal
    content = "Some content"
    header = ""
    codeflash_output = node_reporter._prepare_content(content, header)
    result = codeflash_output  # 845ns -> 677ns (24.8% faster)
    expected_header = "".center(80, "-")


def test_empty_header_and_content(node_reporter):
    # Both header and content are empty
    content = ""
    header = ""
    codeflash_output = node_reporter._prepare_content(content, header)
    result = codeflash_output  # 797ns -> 632ns (26.1% faster)
    expected_header = "".center(80, "-")


def test_multiline_content(node_reporter):
    # Content has multiple lines
    content = "Line1\nLine2\nLine3"
    header = "Multi"
    codeflash_output = node_reporter._prepare_content(content, header)
    result = codeflash_output  # 816ns -> 765ns (6.67% faster)
    expected_header = "Multi".center(80, "-")


def test_header_exactly_80_chars(node_reporter):
    # Header is exactly 80 chars
    header = "H" * 80
    content = "abc"
    codeflash_output = node_reporter._prepare_content(content, header)
    result = codeflash_output  # 749ns -> 687ns (9.02% faster)
    # If header is 80 chars, no padding should be added
    expected_header = header


# 2. Edge Test Cases


def test_header_longer_than_80_chars(node_reporter):
    # Header is longer than 80 chars
    header = "A" * 100
    content = "long header"
    codeflash_output = node_reporter._prepare_content(content, header)
    result = codeflash_output  # 751ns -> 752ns (0.133% slower)
    # center() doesn't truncate, so header remains as is
    expected_header = header


def test_content_with_special_characters(node_reporter):
    # Content contains special characters
    content = "Line1\tLine2\nLine3\r\nUnicode: \u2603"
    header = "Special"
    codeflash_output = node_reporter._prepare_content(content, header)
    result = codeflash_output  # 1.21μs -> 1.10μs (9.74% faster)
    expected_header = "Special".center(80, "-")


def test_header_with_special_characters(node_reporter):
    # Header contains special characters
    header = "Hëädér-\u2603"
    content = "abc"
    codeflash_output = node_reporter._prepare_content(content, header)
    result = codeflash_output  # 1.30μs -> 1.15μs (12.8% faster)
    expected_header = header.center(80, "-")


def test_content_is_whitespace(node_reporter):
    # Content is all whitespace
    content = "   \t  "
    header = "Whitespace"
    codeflash_output = node_reporter._prepare_content(content, header)
    result = codeflash_output  # 861ns -> 697ns (23.5% faster)
    expected_header = "Whitespace".center(80, "-")


def test_header_is_whitespace(node_reporter):
    # Header is all whitespace
    header = "    "
    content = "abc"
    codeflash_output = node_reporter._prepare_content(content, header)
    result = codeflash_output  # 840ns -> 659ns (27.5% faster)
    expected_header = header.center(80, "-")


def test_content_with_newline_at_end(node_reporter):
    # Content ends with a newline
    content = "abc\n"
    header = "NL"
    codeflash_output = node_reporter._prepare_content(content, header)
    result = codeflash_output  # 845ns -> 722ns (17.0% faster)
    expected_header = "NL".center(80, "-")


def test_content_with_newline_at_start(node_reporter):
    # Content starts with a newline
    content = "\nabc"
    header = "NL"
    codeflash_output = node_reporter._prepare_content(content, header)
    result = codeflash_output  # 779ns -> 700ns (11.3% faster)
    expected_header = "NL".center(80, "-")


def test_content_with_only_newlines(node_reporter):
    # Content is only newlines
    content = "\n\n\n"
    header = "NL"
    codeflash_output = node_reporter._prepare_content(content, header)
    result = codeflash_output  # 842ns -> 660ns (27.6% faster)
    expected_header = "NL".center(80, "-")


def test_content_is_none_raises(node_reporter):
    # Passing None as content should raise TypeError
    header = "Header"
    with pytest.raises(TypeError):
        node_reporter._prepare_content(None, header)  # 2.31μs -> 1.67μs (38.4% faster)


def test_content_is_not_str_raises(node_reporter):
    # Passing a non-string content should raise TypeError
    header = "Header"
    with pytest.raises(TypeError):
        node_reporter._prepare_content(123, header)  # 2.75μs -> 1.82μs (51.3% faster)


def test_long_content(node_reporter):
    # Content is a very long string (1000 characters)
    content = "a" * 1000
    header = "LongContent"
    codeflash_output = node_reporter._prepare_content(content, header)
    result = codeflash_output  # 1.24μs -> 1.15μs (7.83% faster)
    expected_header = "LongContent".center(80, "-")


def test_long_multiline_content(node_reporter):
    # Content is 500 lines, each line is "lineX"
    lines = [f"line{i}" for i in range(500)]
    content = "\n".join(lines)
    header = "ManyLines"
    codeflash_output = node_reporter._prepare_content(content, header)
    result = codeflash_output  # 1.03μs -> 1.05μs (1.24% slower)
    expected_header = "ManyLines".center(80, "-")


def test_long_header(node_reporter):
    # Header is 999 characters, content is short
    header = "H" * 999
    content = "short"
    codeflash_output = node_reporter._prepare_content(content, header)
    result = codeflash_output  # 968ns -> 970ns (0.206% slower)
    # Header should not be truncated
    expected_header = header


def test_large_content_and_header(node_reporter):
    # Both header and content are large
    header = "H" * 800
    content = "C" * 800
    codeflash_output = node_reporter._prepare_content(content, header)
    result = codeflash_output  # 1.01μs -> 1.10μs (8.12% slower)
    expected_header = header


def test_large_content_multiline_and_large_header(node_reporter):
    # Header is 500 chars, content is 1000 lines
    header = "HEADER" * 83 + "H"
    lines = ["line" + str(i) for i in range(1000)]
    content = "\n".join(lines)
    codeflash_output = node_reporter._prepare_content(content, header)
    result = codeflash_output  # 1.06μs -> 1.18μs (9.77% slower)
    expected_header = header


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import os
from typing import Dict
from typing import List
from typing import Optional
from typing import Tuple
from typing import Union

from _pytest.junitxml import _NodeReporter

# imports
import pytest


class LogXML:
    def __init__(
        self,
        logfile,
        prefix: Optional[str],
        suite_name: str = "pytest",
        logging: str = "no",
        report_duration: str = "total",
        family="xunit1",
        log_passing_tests: bool = True,
    ) -> None:
        logfile = os.path.expanduser(os.path.expandvars(logfile))
        self.logfile = os.path.normpath(os.path.abspath(logfile))
        self.prefix = prefix
        self.suite_name = suite_name
        self.logging = logging
        self.log_passing_tests = log_passing_tests
        self.report_duration = report_duration
        self.family = family
        self.stats: Dict[str, int] = dict.fromkeys(
            ["error", "passed", "failure", "skipped"], 0
        )
        self.node_reporters: Dict[Tuple[Union[str, object], object], _NodeReporter] = {}
        self.node_reporters_ordered: List[_NodeReporter] = []
        self.global_properties: List[Tuple[str, str]] = []

        # List of reports that failed on call but teardown is pending.
        self.open_reports: List[object] = []
        self.cnt_double_fail_tests = 0

        # Replaces convenience family with real family.
        if self.family == "legacy":
            self.family = "xunit1"

    def add_stats(self, key: str) -> None:
        if key in self.stats:
            self.stats[key] += 1


# unit tests


@pytest.fixture
def reporter():
    """Fixture to provide a _NodeReporter instance."""
    logxml = LogXML(logfile="/tmp/test.xml", prefix=None)
    return _NodeReporter("nodeid", logxml)


# -------------------
# Basic Test Cases
# -------------------


def test_basic_content_and_header(reporter):
    """Test normal content and header."""
    content = "This is the body."
    header = "HEADER"
    codeflash_output = reporter._prepare_content(content, header)
    result = codeflash_output  # 859ns -> 814ns (5.53% faster)
    # Header should be centered in 80 chars, padded with '-'
    expected_header = "HEADER".center(80, "-")


def test_empty_content(reporter):
    """Test with empty content, non-empty header."""
    content = ""
    header = "TestHeader"
    expected_header = "TestHeader".center(80, "-")
    codeflash_output = reporter._prepare_content(content, header)
    result = codeflash_output  # 750ns -> 680ns (10.3% faster)


def test_empty_header(reporter):
    """Test with empty header, non-empty content."""
    content = "Some content"
    header = ""
    expected_header = "".center(80, "-")
    codeflash_output = reporter._prepare_content(content, header)
    result = codeflash_output  # 742ns -> 651ns (14.0% faster)


def test_empty_header_and_content(reporter):
    """Test with both header and content empty."""
    content = ""
    header = ""
    expected_header = "".center(80, "-")
    codeflash_output = reporter._prepare_content(content, header)
    result = codeflash_output  # 773ns -> 682ns (13.3% faster)


def test_header_exactly_80_chars(reporter):
    """Test header that is exactly 80 characters, should not pad."""
    header = "H" * 80
    content = "body"
    # If header is 80 chars, center() returns header as is
    expected_header = header
    codeflash_output = reporter._prepare_content(content, header)
    result = codeflash_output  # 767ns -> 675ns (13.6% faster)


def test_header_longer_than_80_chars(reporter):
    """Test header longer than 80 characters, should not truncate or pad."""
    header = "LONGHEADER" * 10  # 100 chars
    content = "abc"
    expected_header = header  # center() returns as is if longer
    codeflash_output = reporter._prepare_content(content, header)
    result = codeflash_output  # 740ns -> 674ns (9.79% faster)


def test_content_with_newlines(reporter):
    """Test content containing newlines is preserved."""
    content = "line1\nline2\nline3"
    header = "MyHeader"
    expected_header = "MyHeader".center(80, "-")
    codeflash_output = reporter._prepare_content(content, header)
    result = codeflash_output  # 719ns -> 670ns (7.31% faster)


def test_content_with_special_characters(reporter):
    """Test content with special characters is preserved."""
    content = "Ω≈ç√∫˜µ≤≥÷"
    header = "Unicode"
    expected_header = "Unicode".center(80, "-")
    codeflash_output = reporter._prepare_content(content, header)
    result = codeflash_output  # 1.15μs -> 1.15μs (0.262% faster)


def test_header_with_special_characters(reporter):
    """Test header with special characters is centered correctly."""
    content = "body"
    header = "Ω≈ç√∫˜µ≤≥÷"
    expected_header = header.center(80, "-")
    codeflash_output = reporter._prepare_content(content, header)
    result = codeflash_output  # 994ns -> 925ns (7.46% faster)


# -------------------
# Edge Test Cases
# -------------------


def test_content_is_whitespace_only(reporter):
    """Test content that is only whitespace is preserved."""
    content = "   \t  "
    header = "Whitespace"
    expected_header = "Whitespace".center(80, "-")
    codeflash_output = reporter._prepare_content(content, header)
    result = codeflash_output  # 743ns -> 658ns (12.9% faster)


def test_header_is_whitespace_only(reporter):
    """Test header that is only whitespace is centered correctly."""
    content = "body"
    header = "   "
    expected_header = header.center(80, "-")
    codeflash_output = reporter._prepare_content(content, header)
    result = codeflash_output  # 793ns -> 677ns (17.1% faster)


def test_content_and_header_are_whitespace_only(reporter):
    """Test both header and content are whitespace only."""
    content = "   "
    header = "   "
    expected_header = header.center(80, "-")
    codeflash_output = reporter._prepare_content(content, header)
    result = codeflash_output  # 802ns -> 638ns (25.7% faster)


def test_content_is_none_raises_typeerror(reporter):
    """Test that passing None as content raises TypeError."""
    header = "header"
    with pytest.raises(TypeError):
        reporter._prepare_content(None, header)  # 2.27μs -> 1.68μs (35.4% faster)


def test_content_is_integer_raises_typeerror(reporter):
    """Test that passing non-string as content raises TypeError."""
    header = "header"
    with pytest.raises(TypeError):
        reporter._prepare_content(123, header)  # 2.67μs -> 1.81μs (47.1% faster)


def test_content_is_bytes_raises_typeerror(reporter):
    """Test that passing bytes as content raises TypeError."""
    header = "header"
    with pytest.raises(TypeError):
        reporter._prepare_content(b"bytes", header)  # 2.74μs -> 1.78μs (53.6% faster)


def test_header_is_bytes_raises_typeerror(reporter):
    """Test that passing bytes as header raises TypeError."""
    content = "content"
    with pytest.raises(TypeError):
        reporter._prepare_content(content, b"bytes")  # 1.98μs -> 1.95μs (1.59% faster)


def test_content_is_list_raises_typeerror(reporter):
    """Test that passing a list as content raises TypeError."""
    header = "header"
    with pytest.raises(TypeError):
        reporter._prepare_content(["a", "b"], header)  # 2.22μs -> 1.56μs (42.0% faster)


def test_large_content(reporter):
    """Test with very large content (999 characters)."""
    content = "A" * 999
    header = "BigContent"
    expected_header = header.center(80, "-")
    codeflash_output = reporter._prepare_content(content, header)
    result = codeflash_output  # 1.10μs -> 1.05μs (4.96% faster)


def test_large_header(reporter):
    """Test with very large header (999 characters)."""
    header = "H" * 999
    content = "short"
    expected_header = header  # center returns as is if longer than width
    codeflash_output = reporter._prepare_content(content, header)
    result = codeflash_output  # 1.03μs -> 965ns (6.84% faster)


def test_large_content_and_header(reporter):
    """Test with both header and content very large."""
    content = "X" * 999
    header = "Y" * 999
    expected_header = header
    codeflash_output = reporter._prepare_content(content, header)
    result = codeflash_output  # 1.01μs -> 1.05μs (4.08% slower)


def test_content_with_many_newlines(reporter):
    """Test content with many (up to 999) newlines."""
    content = "\n".join([f"line{i}" for i in range(999)])
    header = "ManyLines"
    expected_header = header.center(80, "-")
    codeflash_output = reporter._prepare_content(content, header)
    result = codeflash_output  # 1.02μs -> 1.01μs (0.593% faster)
    lines = result.splitlines()


def test_content_and_header_all_unicode(reporter):
    """Test with large unicode content and header."""
    content = "漢字" * 400  # 800 chars, unicode
    header = "标题" * 40  # 80 chars, unicode
    expected_header = header.center(80, "-")
    codeflash_output = reporter._prepare_content(content, header)
    result = codeflash_output  # 1.33μs -> 1.26μs (4.98% faster)


def test_content_and_header_with_escapes(reporter):
    """Test with content and header containing escape sequences."""
    content = "line1\nline2\tline3\\line4"
    header = "\theader\n"
    expected_header = header.center(80, "-")
    codeflash_output = reporter._prepare_content(content, header)
    result = codeflash_output  # 747ns -> 686ns (8.89% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_NodeReporter._prepare_content-mi9w6j0u and push.

Codeflash Static Badge

The optimization replaces a `"\n".join()` call with direct string concatenation using the `+` operator. 

**Key Change:**
- Original: `return "\n".join([header.center(80, "-"), content, ""])`
- Optimized: `return header.center(80, "-") + "\n" + content + "\n"`

**Why This is Faster:**
The original code creates a temporary list `[header.center(80, "-"), content, ""]` and then calls `str.join()` on it. This involves:
1. List allocation and population (3 elements)
2. Method call overhead for `join()`
3. Internal iteration through the list elements

The optimized version eliminates the intermediate list creation and uses direct string concatenation, which is more efficient for a small, fixed number of strings. Python's string concatenation with `+` is optimized for simple cases like this.

**Performance Impact:**
The optimization shows a consistent 16% speedup overall, with particularly strong improvements (20-50%+) on simpler test cases with shorter strings. The line profiler shows the per-hit time improved from 1033.7ns to 916.2ns (11% per-call improvement).

**Test Case Performance:**
- Best improvements on simple cases: empty content/headers (26-27% faster)
- Good improvements on typical use cases: basic content (5-17% faster) 
- Minimal impact on very large content: some large test cases show slight slowdowns (1-9%), likely due to string concatenation behavior with very large strings

This optimization is particularly effective for the typical use case of formatting test report headers with moderate-sized content, which appears to be the primary purpose of this `_prepare_content` method in pytest's JUnit XML reporting.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 22, 2025 06:12
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant