Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 21, 2025

📄 140% (1.40x) speedup for _check_initialpaths_for_relpath in src/_pytest/nodes.py

⏱️ Runtime : 23.6 milliseconds 9.84 milliseconds (best of 165 runs)

📝 Explanation and details

The optimization replaces the expensive commonpath function call with Python's built-in Path.is_relative_to() method in the critical path checking loop.

Key change: Instead of if commonpath(path, initial_path) == initial_path:, the optimized version uses if path.is_relative_to(initial_path):. This eliminates the need to:

  1. Convert both Path objects to strings via str()
  2. Call os.path.commonpath() which performs complex path resolution
  3. Create a new Path object from the result
  4. Compare the result with the original initial_path

Why this is faster: Path.is_relative_to() is a native pathlib method that directly checks the parent-child relationship without string conversions or intermediate object creation. The profiler shows the optimization reduces the hot loop from 98.5% of runtime (98ms) to 97.1% (48ms) - a ~50% reduction in the most expensive operation.

Performance impact based on function references: The _check_initialpaths_for_relpath function is called during pytest's node initialization when nodeid construction fails with ValueError. This happens during test collection and node creation, making it part of pytest's startup path. The 139% speedup will meaningfully improve test discovery performance, especially in projects with complex directory structures or many initial paths.

Test case effectiveness: The optimization shows consistent 50-185% speedups across all test scenarios, with particularly strong gains in large-scale cases (e.g., 1000+ initial paths) where the loop overhead compounds. Edge cases like relative paths, symlinks, and Unicode filenames all benefit equally since the core optimization avoids string processing entirely.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 54 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from pathlib import Path

# imports
from _pytest.nodes import _check_initialpaths_for_relpath


class DummySession:
    """Minimal session mock with _initialpaths attribute."""

    def __init__(self, initialpaths):
        self._initialpaths = initialpaths


# unit tests

# ----------- Basic Test Cases -----------


def test_basic_exact_match_returns_empty_string():
    """If the path is exactly one of the initial paths, return empty string."""
    session = DummySession([Path("/foo/bar")])
    path = Path("/foo/bar")
    codeflash_output = _check_initialpaths_for_relpath(
        session, path
    )  # 23.4μs -> 11.6μs (101% faster)


def test_basic_subpath_returns_relative():
    """If the path is a subpath of one of the initial paths, return correct relative path."""
    session = DummySession([Path("/foo/bar")])
    path = Path("/foo/bar/baz.txt")
    codeflash_output = _check_initialpaths_for_relpath(
        session, path
    )  # 23.4μs -> 11.6μs (101% faster)


def test_basic_multiple_initialpaths_first_match():
    """If multiple initial paths, return relative to the first matching one."""
    session = DummySession([Path("/foo"), Path("/foo/bar")])
    path = Path("/foo/bar/baz.txt")
    # /foo matches first, so relpath from /foo is bar/baz.txt
    codeflash_output = _check_initialpaths_for_relpath(
        session, path
    )  # 21.5μs -> 10.5μs (104% faster)


def test_basic_multiple_initialpaths_second_match():
    """If multiple initial paths, return relative to the first matching one."""
    session = DummySession([Path("/foo/bar"), Path("/foo")])
    path = Path("/foo/bar/baz.txt")
    # /foo/bar matches first, so relpath from /foo/bar is baz.txt
    codeflash_output = _check_initialpaths_for_relpath(
        session, path
    )  # 22.9μs -> 11.0μs (109% faster)


def test_basic_no_match_returns_none():
    """If the path is not a subpath of any initial path, return None."""
    session = DummySession([Path("/foo/bar")])
    path = Path("/other/path.txt")
    codeflash_output = _check_initialpaths_for_relpath(
        session, path
    )  # 16.2μs -> 10.8μs (50.0% faster)


# ----------- Edge Test Cases -----------


def test_edge_relative_paths():
    """Relative paths: should work if both are relative."""
    session = DummySession([Path("foo/bar")])
    path = Path("foo/bar/baz.txt")
    codeflash_output = _check_initialpaths_for_relpath(
        session, path
    )  # 21.2μs -> 9.56μs (122% faster)


def test_edge_mixed_absolute_and_relative():
    """Mixed absolute and relative: should not match, returns None."""
    session = DummySession([Path("/foo/bar")])
    path = Path("foo/bar/baz.txt")
    codeflash_output = _check_initialpaths_for_relpath(
        session, path
    )  # 11.0μs -> 10.3μs (6.77% faster)


def test_edge_empty_initialpaths():
    """Empty initialpaths: always returns None."""
    session = DummySession([])
    path = Path("/foo/bar")
    codeflash_output = _check_initialpaths_for_relpath(
        session, path
    )  # 415ns -> 422ns (1.66% slower)


def test_edge_initialpaths_with_trailing_slash():
    """Initial path with trailing slash should match correctly."""
    session = DummySession([Path("/foo/bar/")])
    path = Path("/foo/bar/baz.txt")
    codeflash_output = _check_initialpaths_for_relpath(
        session, path
    )  # 24.3μs -> 12.3μs (98.6% faster)


def test_edge_path_is_parent_of_initialpath():
    """Path is parent of initial path: should not match."""
    session = DummySession([Path("/foo/bar/baz")])
    path = Path("/foo/bar")
    codeflash_output = _check_initialpaths_for_relpath(
        session, path
    )  # 17.0μs -> 11.0μs (54.5% faster)


def test_edge_path_is_not_subpath_but_has_common_prefix():
    """Path has common prefix but isn't a subpath."""
    session = DummySession([Path("/foo/bar")])
    path = Path("/foo/barista/baz.txt")
    codeflash_output = _check_initialpaths_for_relpath(
        session, path
    )  # 16.9μs -> 10.4μs (62.8% faster)


def test_edge_dot_path():
    """Path is '.' and initial path is '.'."""
    session = DummySession([Path(".")])
    path = Path(".")
    codeflash_output = _check_initialpaths_for_relpath(
        session, path
    )  # 18.4μs -> 7.67μs (140% faster)


def test_edge_dot_subpath():
    """Path is subpath of '.', e.g. ./foo.txt."""
    session = DummySession([Path(".")])
    path = Path("foo.txt")
    codeflash_output = _check_initialpaths_for_relpath(
        session, path
    )  # 18.1μs -> 7.43μs (144% faster)


def test_edge_initialpaths_overlap():
    """Initial paths overlap, should match first one."""
    session = DummySession([Path("/foo"), Path("/foo/bar")])
    path = Path("/foo/bar/baz.txt")
    # /foo matches first, so relpath from /foo is bar/baz.txt
    codeflash_output = _check_initialpaths_for_relpath(
        session, path
    )  # 22.3μs -> 10.8μs (106% faster)


def test_edge_initialpaths_with_dotdot():
    """Initial path with '..' should work correctly."""
    session = DummySession([Path("foo/../bar")])
    path = Path("bar/baz.txt")
    codeflash_output = _check_initialpaths_for_relpath(
        session, path
    )  # 15.0μs -> 9.25μs (62.4% faster)


def test_edge_initialpath_is_root():
    """Initial path is root, path is subpath."""
    session = DummySession([Path("/")])
    path = Path("/foo/bar.txt")
    codeflash_output = _check_initialpaths_for_relpath(
        session, path
    )  # 20.7μs -> 10.1μs (104% faster)


def test_edge_path_is_symlink_to_initialpath(tmp_path):
    """Symlinked path to initial path should match."""
    # This test requires filesystem support for symlinks.
    initial = tmp_path / "dir"
    initial.mkdir()
    target = tmp_path / "dir2"
    target.symlink_to(initial, target_is_directory=True)
    session = DummySession([initial])
    # The symlink itself is not a subpath, but its children are
    file = target / "file.txt"
    file.write_text("data")
    codeflash_output = _check_initialpaths_for_relpath(
        session, file
    )  # 18.9μs -> 11.2μs (69.2% faster)


def test_edge_case_sensitive():
    """Case sensitivity: on Unix, should be case sensitive."""
    session = DummySession([Path("/foo/bar")])
    path = Path("/foo/BAR/baz.txt")
    codeflash_output = _check_initialpaths_for_relpath(
        session, path
    )  # 17.8μs -> 11.3μs (57.3% faster)


# ----------- Large Scale Test Cases -----------


def test_large_scale_many_initialpaths_first_matches():
    """Large number of initial paths, first one matches."""
    initialpaths = [Path(f"/base{i}") for i in range(500)]
    session = DummySession(initialpaths)
    path = Path("/base0/file.txt")
    codeflash_output = _check_initialpaths_for_relpath(
        session, path
    )  # 24.8μs -> 12.3μs (102% faster)


def test_large_scale_many_initialpaths_last_matches():
    """Large number of initial paths, last one matches."""
    initialpaths = [Path(f"/base{i}") for i in range(500)]
    session = DummySession(initialpaths)
    path = Path("/base499/file.txt")
    codeflash_output = _check_initialpaths_for_relpath(
        session, path
    )  # 2.87ms -> 1.15ms (149% faster)


def test_large_scale_no_matches():
    """Large number of initial paths, none match."""
    initialpaths = [Path(f"/base{i}") for i in range(500)]
    session = DummySession(initialpaths)
    path = Path("/other/file.txt")
    codeflash_output = _check_initialpaths_for_relpath(
        session, path
    )  # 2.89ms -> 1.14ms (154% faster)


def test_large_scale_long_path():
    """Very long path under initial path."""
    initial = Path("/a")
    session = DummySession([initial])
    # Create a long path with 1000 components
    long_path = initial.joinpath(*[f"dir{i}" for i in range(1000)])
    codeflash_output = _check_initialpaths_for_relpath(
        session, long_path
    )  # 91.6μs -> 33.6μs (172% faster)


def test_large_scale_many_initialpaths_with_overlap():
    """Many initial paths with overlapping prefixes, should match first."""
    initialpaths = [Path(f"/foo/bar{i}") for i in range(500)]
    session = DummySession(initialpaths)
    path = Path("/foo/bar100/baz.txt")
    codeflash_output = _check_initialpaths_for_relpath(
        session, path
    )  # 617μs -> 269μs (129% faster)


def test_large_scale_all_initialpaths_are_subpaths():
    """All initial paths are subpaths of the main path, should match first."""
    initialpaths = [Path(f"/foo/bar/baz{i}") for i in range(500)]
    session = DummySession(initialpaths)
    path = Path("/foo/bar/baz0/qux.txt")
    codeflash_output = _check_initialpaths_for_relpath(
        session, path
    )  # 25.0μs -> 12.6μs (98.1% faster)


def test_large_scale_path_is_exact_initialpath():
    """Path is exactly one of many initial paths."""
    initialpaths = [Path(f"/foo/bar{i}") for i in range(500)]
    session = DummySession(initialpaths)
    path = Path("/foo/bar123")
    codeflash_output = _check_initialpaths_for_relpath(
        session, path
    )  # 742μs -> 328μs (126% faster)


def test_large_scale_relative_paths():
    """Large scale with relative paths."""
    initialpaths = [Path(f"foo/bar{i}") for i in range(500)]
    session = DummySession(initialpaths)
    path = Path("foo/bar499/baz.txt")
    codeflash_output = _check_initialpaths_for_relpath(
        session, path
    )  # 2.81ms -> 985μs (185% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from pathlib import Path

# imports
from _pytest.nodes import _check_initialpaths_for_relpath


# Helper class to mimic pytest Session with _initialpaths
class DummySession:
    def __init__(self, initialpaths):
        self._initialpaths = initialpaths


# ------------------ UNIT TESTS ------------------

# 1. BASIC TEST CASES


def test_exact_match_returns_empty_string():
    # Path is exactly the initial path
    session = DummySession([Path("/foo/bar")])
    path = Path("/foo/bar")
    codeflash_output = _check_initialpaths_for_relpath(session, path)
    result = codeflash_output  # 21.5μs -> 11.2μs (92.0% faster)


def test_subpath_returns_relative():
    # Path is a subpath of initial path
    session = DummySession([Path("/foo/bar")])
    path = Path("/foo/bar/baz.txt")
    codeflash_output = _check_initialpaths_for_relpath(session, path)
    result = codeflash_output  # 22.6μs -> 11.1μs (103% faster)


def test_multiple_initialpaths_first_match():
    # Multiple initial paths; should match the first one that fits
    session = DummySession([Path("/foo/bar"), Path("/foo/bar/baz")])
    path = Path("/foo/bar/baz/qux.txt")
    codeflash_output = _check_initialpaths_for_relpath(session, path)
    result = codeflash_output  # 22.2μs -> 11.1μs (99.8% faster)


def test_multiple_initialpaths_second_match():
    # Multiple initial paths; should match the second one if first doesn't fit
    session = DummySession([Path("/foo/bar"), Path("/foo/bar/baz")])
    path = Path("/foo/bar/baz/qux.txt")
    # Remove the first initial path so second is the only match
    session = DummySession([Path("/foo/bar/baz"), Path("/foo/bar")])
    codeflash_output = _check_initialpaths_for_relpath(session, path)
    result = codeflash_output  # 23.4μs -> 11.5μs (103% faster)


def test_no_match_returns_none():
    # Path does not match any initial path
    session = DummySession([Path("/foo/bar")])
    path = Path("/other/path.txt")
    codeflash_output = _check_initialpaths_for_relpath(session, path)
    result = codeflash_output  # 16.0μs -> 10.6μs (51.9% faster)


# 2. EDGE TEST CASES


def test_relative_paths():
    # All paths are relative
    session = DummySession([Path("foo/bar")])
    path = Path("foo/bar/baz.txt")
    codeflash_output = _check_initialpaths_for_relpath(session, path)
    result = codeflash_output  # 20.5μs -> 9.24μs (122% faster)


def test_mixed_absolute_and_relative():
    # One absolute, one relative path (should not match)
    session = DummySession([Path("/foo/bar")])
    path = Path("foo/bar/baz.txt")
    codeflash_output = _check_initialpaths_for_relpath(session, path)
    result = codeflash_output  # 11.0μs -> 10.2μs (7.45% faster)


def test_path_is_parent_of_initialpath():
    # Path is parent of initial path (should not match)
    session = DummySession([Path("/foo/bar/baz")])
    path = Path("/foo/bar")
    codeflash_output = _check_initialpaths_for_relpath(session, path)
    result = codeflash_output  # 17.2μs -> 10.6μs (62.3% faster)


def test_initialpaths_empty():
    # No initial paths at all
    session = DummySession([])
    path = Path("/foo/bar")
    codeflash_output = _check_initialpaths_for_relpath(session, path)
    result = codeflash_output  # 423ns -> 446ns (5.16% slower)


def test_dot_path():
    # Path is '.' (current directory)
    session = DummySession([Path(".")])
    path = Path(".")
    codeflash_output = _check_initialpaths_for_relpath(session, path)
    result = codeflash_output  # 17.9μs -> 7.61μs (136% faster)


def test_dot_subpath():
    # Path is a subpath of '.'
    session = DummySession([Path(".")])
    path = Path("./baz.txt")
    codeflash_output = _check_initialpaths_for_relpath(session, path)
    result = codeflash_output  # 18.4μs -> 7.35μs (150% faster)


def test_symlink_paths(tmp_path):
    # Symlinked initial path and real path
    real_dir = tmp_path / "real"
    real_dir.mkdir()
    symlink_dir = tmp_path / "link"
    symlink_dir.symlink_to(real_dir, target_is_directory=True)
    session = DummySession([symlink_dir])
    path = real_dir / "file.txt"
    file_path = path
    file_path.write_text("hello")
    # Symlink and real path are not string-equal but resolve to same location
    codeflash_output = _check_initialpaths_for_relpath(session, file_path)
    result = codeflash_output  # 19.2μs -> 11.3μs (69.9% faster)


def test_case_sensitivity():
    # On case-sensitive filesystems, paths differing in case should not match
    session = DummySession([Path("/foo/bar")])
    path = Path("/FOO/BAR/baz.txt")
    codeflash_output = _check_initialpaths_for_relpath(session, path)
    result = codeflash_output  # 17.3μs -> 11.0μs (56.5% faster)


def test_path_with_trailing_slash():
    # Path with trailing slash
    session = DummySession([Path("/foo/bar")])
    path = Path("/foo/bar/")
    codeflash_output = _check_initialpaths_for_relpath(session, path)
    result = codeflash_output  # 24.3μs -> 11.6μs (109% faster)


def test_initialpath_with_trailing_slash():
    # Initial path with trailing slash
    session = DummySession([Path("/foo/bar/")])
    path = Path("/foo/bar/baz.txt")
    codeflash_output = _check_initialpaths_for_relpath(session, path)
    result = codeflash_output  # 24.0μs -> 11.4μs (109% faster)


def test_path_is_initialpath_dotdot():
    # Path is '..' relative to initial path (should not match)
    session = DummySession([Path("/foo/bar/baz")])
    path = Path("/foo/bar/baz/../qux.txt").resolve()
    codeflash_output = _check_initialpaths_for_relpath(session, path)
    result = codeflash_output  # 14.7μs -> 9.05μs (61.8% faster)


# 3. LARGE SCALE TEST CASES


def test_many_initialpaths_first_match():
    # Many initial paths; should match the first one that fits
    initialpaths = [Path(f"/base/{i}") for i in range(100)]
    session = DummySession(initialpaths)
    path = Path("/base/42/subdir/file.txt")
    codeflash_output = _check_initialpaths_for_relpath(session, path)
    result = codeflash_output  # 282μs -> 123μs (128% faster)


def test_many_initialpaths_no_match():
    # Many initial paths; no match
    initialpaths = [Path(f"/base/{i}") for i in range(100)]
    session = DummySession(initialpaths)
    path = Path("/other/location/file.txt")
    codeflash_output = _check_initialpaths_for_relpath(session, path)
    result = codeflash_output  # 574μs -> 261μs (120% faster)


def test_large_path_depth():
    # Path with large depth
    initial = Path("/foo")
    session = DummySession([initial])
    subdirs = "/".join([f"dir{i}" for i in range(100)])
    path = Path(f"/foo/{subdirs}/file.txt")
    expected = f"{subdirs}/file.txt"
    codeflash_output = _check_initialpaths_for_relpath(session, path)
    result = codeflash_output  # 32.0μs -> 13.4μs (139% faster)


def test_large_number_of_paths_performance():
    # Large number of initial paths, only one matches
    initialpaths = [Path(f"/foo/{i}") for i in range(999)]
    session = DummySession(initialpaths)
    path = Path("/foo/998/file.txt")
    codeflash_output = _check_initialpaths_for_relpath(session, path)
    result = codeflash_output  # 6.03ms -> 2.54ms (137% faster)


def test_large_number_of_paths_none_match_performance():
    # Large number of initial paths, none match
    initialpaths = [Path(f"/foo/{i}") for i in range(999)]
    session = DummySession(initialpaths)
    path = Path("/bar/0/file.txt")
    codeflash_output = _check_initialpaths_for_relpath(session, path)
    result = codeflash_output  # 5.84ms -> 2.57ms (127% faster)


# 4. ADDITIONAL EDGE CASES


def test_initialpaths_with_dot_and_absolute():
    # Initial paths include both '.' and an absolute path
    session = DummySession([Path("."), Path("/foo/bar")])
    path = Path("./baz.txt")
    codeflash_output = _check_initialpaths_for_relpath(session, path)
    result = codeflash_output  # 18.3μs -> 7.73μs (136% faster)


def test_initialpaths_with_empty_string():
    # Initial path is empty string (should behave as '.')
    session = DummySession([Path("")])
    path = Path("baz.txt")
    codeflash_output = _check_initialpaths_for_relpath(session, path)
    result = codeflash_output  # 17.1μs -> 7.19μs (138% faster)


def test_initialpaths_with_root_path():
    # Initial path is root
    session = DummySession([Path("/")])
    path = Path("/foo/bar.txt")
    codeflash_output = _check_initialpaths_for_relpath(session, path)
    result = codeflash_output  # 20.8μs -> 10.0μs (108% faster)


def test_path_with_spaces():
    # Path with spaces
    session = DummySession([Path("/foo/bar")])
    path = Path("/foo/bar/a file.txt")
    codeflash_output = _check_initialpaths_for_relpath(session, path)
    result = codeflash_output  # 24.0μs -> 11.4μs (110% faster)


def test_path_with_unicode():
    # Path with unicode characters
    session = DummySession([Path("/foo/bar")])
    path = Path("/foo/bar/файл.txt")
    codeflash_output = _check_initialpaths_for_relpath(session, path)
    result = codeflash_output  # 24.3μs -> 11.2μs (117% faster)


def test_path_with_dot_in_name():
    # Path with dot in file name
    session = DummySession([Path("/foo/bar")])
    path = Path("/foo/bar/.hiddenfile")
    codeflash_output = _check_initialpaths_for_relpath(session, path)
    result = codeflash_output  # 23.2μs -> 11.2μs (107% faster)


def test_path_is_initialpath_double_dot():
    # Path is '..' relative to initial path (should not match)
    session = DummySession([Path("/foo/bar")])
    path = Path("/foo/bar/../baz.txt").resolve()
    codeflash_output = _check_initialpaths_for_relpath(session, path)
    result = codeflash_output  # 14.2μs -> 8.62μs (64.7% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_check_initialpaths_for_relpath-mi9il9fw and push.

Codeflash Static Badge

The optimization replaces the expensive `commonpath` function call with Python's built-in `Path.is_relative_to()` method in the critical path checking loop.

**Key change**: Instead of `if commonpath(path, initial_path) == initial_path:`, the optimized version uses `if path.is_relative_to(initial_path):`. This eliminates the need to:
1. Convert both `Path` objects to strings via `str()`
2. Call `os.path.commonpath()` which performs complex path resolution
3. Create a new `Path` object from the result
4. Compare the result with the original `initial_path`

**Why this is faster**: `Path.is_relative_to()` is a native pathlib method that directly checks the parent-child relationship without string conversions or intermediate object creation. The profiler shows the optimization reduces the hot loop from 98.5% of runtime (98ms) to 97.1% (48ms) - a ~50% reduction in the most expensive operation.

**Performance impact based on function references**: The `_check_initialpaths_for_relpath` function is called during pytest's node initialization when `nodeid` construction fails with `ValueError`. This happens during test collection and node creation, making it part of pytest's startup path. The 139% speedup will meaningfully improve test discovery performance, especially in projects with complex directory structures or many initial paths.

**Test case effectiveness**: The optimization shows consistent 50-185% speedups across all test scenarios, with particularly strong gains in large-scale cases (e.g., 1000+ initial paths) where the loop overhead compounds. Edge cases like relative paths, symlinks, and Unicode filenames all benefit equally since the core optimization avoids string processing entirely.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 21, 2025 23:52
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant