Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 22, 2025

📄 119% (1.19x) speedup for create_rename_keys in src/transformers/models/deprecated/deta/convert_deta_swin_to_pytorch.py

⏱️ Runtime : 8.31 milliseconds 3.79 milliseconds (best of 250 runs)

📝 Explanation and details

The optimized code achieves a 119% speedup by eliminating redundant list operations and reducing attribute lookups in nested loops.

Key optimizations applied:

  1. Batch operations with extend() instead of individual append() calls: The original code made thousands of individual list.append() calls, each requiring list resizing and memory allocation. The optimized version groups related keys into lists and uses extend() to add them in batches, reducing the overhead from O(n) individual operations to O(1) batch operations per group.

  2. Cached attribute lookups: The original code repeatedly accessed config.backbone_config.depths[i] and config.encoder_layers/decoder_layers within loops. The optimized version caches these values (depths = config.backbone_config.depths, depth_i = depths[i], etc.) to eliminate redundant attribute lookups.

  3. String prefix caching: In tight loops that generate many f-strings with the same prefixes, the optimized code pre-calculates common string prefixes (src_prefix, tgt_prefix) and reuses them, reducing string formatting overhead.

Why this leads to speedup:

  • List append() operations have overhead for bounds checking, potential resizing, and individual memory allocations
  • Python attribute lookups (obj.attr.subattr) traverse the object hierarchy each time
  • String formatting with f-strings has computational cost that multiplies in loops

Performance characteristics from test results:

  • Small configs (1-2 stages): 30-60% speedup due to reduced overhead
  • Medium configs (3-10 stages): 50-85% speedup as batch operations become more beneficial
  • Large configs (100+ blocks/layers): 95-150% speedup where the optimization impact is most pronounced

The optimization is particularly effective for this function because it processes hundreds to thousands of key mappings in nested loops, making the reduction in per-operation overhead highly impactful.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 28 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from transformers.models.deprecated.deta.convert_deta_swin_to_pytorch import create_rename_keys


# Helper config classes for testing
class BackboneConfig:
    def __init__(self, depths):
        self.depths = depths


class Config:
    def __init__(self, backbone_config, encoder_layers, decoder_layers):
        self.backbone_config = backbone_config
        self.encoder_layers = encoder_layers
        self.decoder_layers = decoder_layers


# ---------------- BASIC TEST CASES ----------------


def test_basic_single_layer_single_block():
    # Test with 1 stage, 1 block, 1 encoder layer, 1 decoder layer
    config = Config(BackboneConfig([1]), encoder_layers=1, decoder_layers=1)
    codeflash_output = create_rename_keys(config)
    keys = codeflash_output  # 11.7μs -> 8.78μs (33.7% faster)


def test_basic_multiple_stages_blocks():
    # Test with 2 stages, 2 blocks each, 2 encoder layers, 2 decoder layers
    config = Config(BackboneConfig([2, 2]), encoder_layers=2, decoder_layers=2)
    codeflash_output = create_rename_keys(config)
    keys = codeflash_output  # 24.4μs -> 15.2μs (60.5% faster)
    # Check all blocks for both stages
    for i in range(2):
        for j in range(2):
            pass
    # Check downsample keys for i=0,1
    for i in range(2):
        pass
    # Check encoder/decoder keys for both layers
    for i in range(2):
        pass


def test_basic_no_downsample_for_last_stage():
    # Test with 4 stages, last stage should not have downsample keys
    config = Config(BackboneConfig([1, 1, 1, 1]), encoder_layers=1, decoder_layers=1)
    codeflash_output = create_rename_keys(config)
    keys = codeflash_output  # 19.2μs -> 12.2μs (57.9% faster)
    # Downsample keys only for i=0,1,2
    for i in range(3):
        pass


# ---------------- EDGE TEST CASES ----------------


def test_edge_zero_stages():
    # Test with zero stages
    config = Config(BackboneConfig([]), encoder_layers=1, decoder_layers=1)
    codeflash_output = create_rename_keys(config)
    keys = codeflash_output  # 7.47μs -> 5.34μs (40.0% faster)
    # No block keys should be present
    for k in keys:
        pass


def test_edge_zero_blocks_in_stage():
    # Test with one stage, zero blocks
    config = Config(BackboneConfig([0]), encoder_layers=1, decoder_layers=1)
    codeflash_output = create_rename_keys(config)
    keys = codeflash_output  # 8.29μs -> 6.41μs (29.4% faster)
    # No block keys for stage 0
    for k in keys:
        pass


def test_edge_zero_encoder_layers():
    # Test with zero encoder layers
    config = Config(BackboneConfig([1]), encoder_layers=0, decoder_layers=1)
    codeflash_output = create_rename_keys(config)
    keys = codeflash_output  # 9.09μs -> 6.51μs (39.6% faster)
    # No encoder keys
    for k in keys:
        pass


def test_edge_zero_decoder_layers():
    # Test with zero decoder layers
    config = Config(BackboneConfig([1]), encoder_layers=1, decoder_layers=0)
    codeflash_output = create_rename_keys(config)
    keys = codeflash_output  # 8.39μs -> 6.08μs (38.0% faster)
    # No decoder keys
    for k in keys:
        pass


def test_edge_zero_everything():
    # Test with zero stages, zero blocks, zero encoder/decoder layers
    config = Config(BackboneConfig([]), encoder_layers=0, decoder_layers=0)
    codeflash_output = create_rename_keys(config)
    keys = codeflash_output  # 1.55μs -> 1.48μs (4.52% faster)
    # Only stem and norm keys should exist
    expected_keys = {
        "backbone.0.body.patch_embed.proj.weight",
        "backbone.0.body.patch_embed.proj.bias",
        "backbone.0.body.patch_embed.norm.weight",
        "backbone.0.body.patch_embed.norm.bias",
        "backbone.0.body.norm1.weight",
        "backbone.0.body.norm1.bias",
        "backbone.0.body.norm2.weight",
        "backbone.0.body.norm2.bias",
        "backbone.0.body.norm3.weight",
        "backbone.0.body.norm3.bias",
    }
    actual_keys = set(k[0] for k in keys)


def test_edge_nonstandard_depths():
    # Test with nonstandard depths (e.g. 3, 0, 2)
    config = Config(BackboneConfig([3, 0, 2]), encoder_layers=1, decoder_layers=1)
    codeflash_output = create_rename_keys(config)
    keys = codeflash_output  # 23.7μs -> 14.9μs (59.7% faster)
    # Stage 0: 3 blocks
    for j in range(3):
        pass
    # Stage 2: 2 blocks
    for j in range(2):
        pass


# ---------------- LARGE SCALE TEST CASES ----------------


def test_large_scale_max_stages_blocks():
    # Test with 4 stages, 10 blocks each, 10 encoder layers, 10 decoder layers
    config = Config(BackboneConfig([10, 10, 10, 10]), encoder_layers=10, decoder_layers=10)
    codeflash_output = create_rename_keys(config)
    keys = codeflash_output  # 143μs -> 64.2μs (124% faster)
    # Check total number of block keys
    expected_block_keys = 4 * 10 * 12  # 4 stages, 10 blocks, 12 keys per block
    actual_block_keys = sum(k[0].startswith("backbone.0.body.layers.") and ".blocks." in k[0] for k in keys)
    # Downsample keys: only for first 3 stages
    for i in range(3):
        pass
    # Encoder/decoder keys
    for i in range(10):
        pass
    # Check total keys count
    # Calculate expected total keys
    stem_keys = 4
    norm_keys = 6
    block_keys = expected_block_keys
    downsample_keys = 3 * 3  # 3 stages, 3 keys each
    encoder_keys = 10 * 16
    decoder_keys = 10 * 18
    expected_total_keys = stem_keys + norm_keys + block_keys + downsample_keys + encoder_keys + decoder_keys


def test_large_scale_one_stage_many_blocks():
    # Test with 1 stage, 100 blocks, 1 encoder layer, 1 decoder layer
    config = Config(BackboneConfig([100]), encoder_layers=1, decoder_layers=1)
    codeflash_output = create_rename_keys(config)
    keys = codeflash_output  # 225μs -> 89.1μs (153% faster)
    # Check all block keys for stage 0
    for j in range(100):
        pass
    # Check total block keys
    block_keys = sum(k[0].startswith("backbone.0.body.layers.0.blocks.") for k in keys)


def test_large_scale_many_encoder_decoder_layers():
    # Test with 1 stage, 1 block, 100 encoder layers, 100 decoder layers
    config = Config(BackboneConfig([1]), encoder_layers=100, decoder_layers=100)
    codeflash_output = create_rename_keys(config)
    keys = codeflash_output  # 452μs -> 228μs (98.0% faster)
    # Check total encoder/decoder keys
    encoder_keys = sum(k[0].startswith("transformer.encoder.layers.") for k in keys)
    decoder_keys = sum(k[0].startswith("transformer.decoder.layers.") for k in keys)


def test_large_scale_mixed():
    # Test with 3 stages (2,3,4 blocks), 5 encoder layers, 6 decoder layers
    config = Config(BackboneConfig([2, 3, 4]), encoder_layers=5, decoder_layers=6)
    codeflash_output = create_rename_keys(config)
    keys = codeflash_output  # 54.0μs -> 29.3μs (84.3% faster)
    # Check block keys for each stage/block
    for i, blocks in enumerate([2, 3, 4]):
        for j in range(blocks):
            pass
    # Downsample keys for i=0,1,2 (since len(depths)=3)
    for i in range(3):
        pass
    # Encoder/decoder keys for all layers
    for i in range(5):
        pass
    for i in range(6):
        pass


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import pytest

from transformers.models.deprecated.deta.convert_deta_swin_to_pytorch import create_rename_keys


# Helper classes for config objects
class BackboneConfig:
    def __init__(self, depths):
        self.depths = depths


class Config:
    def __init__(self, backbone_config, encoder_layers, decoder_layers):
        self.backbone_config = backbone_config
        self.encoder_layers = encoder_layers
        self.decoder_layers = decoder_layers


# ---------------------
# Basic Test Cases
# ---------------------


def test_basic_minimal_config():
    # Minimal config: 1 stage, 1 block, 1 encoder layer, 1 decoder layer
    config = Config(backbone_config=BackboneConfig(depths=[1]), encoder_layers=1, decoder_layers=1)
    codeflash_output = create_rename_keys(config)
    keys = codeflash_output  # 11.5μs -> 8.35μs (38.1% faster)


def test_basic_multiple_stages_blocks():
    # 2 stages, first with 2 blocks, second with 1 block, 2 encoder layers, 1 decoder layer
    config = Config(backbone_config=BackboneConfig(depths=[2, 1]), encoder_layers=2, decoder_layers=1)
    codeflash_output = create_rename_keys(config)
    keys = codeflash_output  # 19.7μs -> 12.8μs (53.5% faster)


def test_basic_downsample_keys_only_for_first_three_stages():
    # 4 stages, but downsample keys only for first 3
    config = Config(backbone_config=BackboneConfig(depths=[1, 1, 1, 1]), encoder_layers=1, decoder_layers=1)
    codeflash_output = create_rename_keys(config)
    keys = codeflash_output  # 20.1μs -> 12.6μs (59.6% faster)
    # Downsample keys for stages 0,1,2
    for i in range(3):
        pass


# ---------------------
# Edge Test Cases
# ---------------------


def test_edge_zero_stages():
    # No stages, but encoder/decoder layers present
    config = Config(backbone_config=BackboneConfig(depths=[]), encoder_layers=1, decoder_layers=1)
    codeflash_output = create_rename_keys(config)
    keys = codeflash_output  # 7.66μs -> 5.35μs (43.1% faster)


def test_edge_zero_blocks_in_stages():
    # Stages exist, but all with zero blocks
    config = Config(backbone_config=BackboneConfig(depths=[0, 0, 0]), encoder_layers=1, decoder_layers=1)
    codeflash_output = create_rename_keys(config)
    keys = codeflash_output  # 9.78μs -> 8.05μs (21.5% faster)
    # Should have downsample keys for stages 0,1,2, but no block keys
    for i in range(3):
        pass


def test_edge_zero_encoder_decoder_layers():
    # Stages exist, but no encoder/decoder layers
    config = Config(backbone_config=BackboneConfig(depths=[1, 2]), encoder_layers=0, decoder_layers=0)
    codeflash_output = create_rename_keys(config)
    keys = codeflash_output  # 11.7μs -> 7.59μs (54.0% faster)


def test_edge_large_stage_index_no_downsample():
    # Stage index >= 3 should not have downsample keys
    config = Config(backbone_config=BackboneConfig(depths=[1, 1, 1, 1, 1]), encoder_layers=1, decoder_layers=1)
    codeflash_output = create_rename_keys(config)
    keys = codeflash_output  # 23.3μs -> 14.4μs (61.7% faster)
    # Downsample keys for stages 0,1,2 only
    for i in range(3):
        pass
    for i in range(3, 5):
        pass


def test_edge_non_integer_depths():
    # Depths with non-integer values should raise TypeError
    config = Config(backbone_config=BackboneConfig(depths=["a", None, 1.5]), encoder_layers=1, decoder_layers=1)
    with pytest.raises(TypeError):
        create_rename_keys(config)  # 1.94μs -> 1.88μs (3.03% faster)


def test_edge_negative_depths():
    # Negative depths should result in no block keys for those stages
    config = Config(backbone_config=BackboneConfig(depths=[1, -1, 2]), encoder_layers=1, decoder_layers=1)
    codeflash_output = create_rename_keys(config)
    keys = codeflash_output  # 18.2μs -> 12.8μs (42.1% faster)


def test_edge_non_integer_encoder_decoder_layers():
    # Non-integer encoder/decoder layers should raise TypeError
    config = Config(backbone_config=BackboneConfig(depths=[1, 1]), encoder_layers="2", decoder_layers=None)
    with pytest.raises(TypeError):
        create_rename_keys(config)  # 9.71μs -> 6.84μs (42.0% faster)


def test_edge_negative_encoder_decoder_layers():
    # Negative encoder/decoder layers should result in no keys for those layers
    config = Config(backbone_config=BackboneConfig(depths=[1]), encoder_layers=-2, decoder_layers=-1)
    codeflash_output = create_rename_keys(config)
    keys = codeflash_output  # 6.03μs -> 4.78μs (26.1% faster)


def test_edge_empty_config():
    # Empty config should raise AttributeError
    class EmptyConfig:
        pass

    with pytest.raises(AttributeError):
        create_rename_keys(EmptyConfig())  # 1.47μs -> 1.33μs (9.82% faster)


# ---------------------
# Large Scale Test Cases
# ---------------------


def test_large_scale_many_stages_blocks_layers():
    # 10 stages, each with 5 blocks, 20 encoder layers, 20 decoder layers
    depths = [5] * 10
    config = Config(backbone_config=BackboneConfig(depths=depths), encoder_layers=20, decoder_layers=20)
    codeflash_output = create_rename_keys(config)
    keys = codeflash_output  # 214μs -> 98.4μs (118% faster)
    # Total block keys: 10 stages * 5 blocks * 12 keys per block
    block_keys_count = sum(1 for k in keys if ".blocks." in k[0])
    # Downsample keys: only for first 3 stages, each with 3 keys
    downsample_keys_count = sum(1 for k in keys if ".downsample." in k[0])
    # Encoder keys: 20 layers * 16 keys per layer
    encoder_keys_count = sum(1 for k in keys if "transformer.encoder.layers." in k[0])
    # Decoder keys: 20 layers * 18 keys per layer
    decoder_keys_count = sum(1 for k in keys if "transformer.decoder.layers." in k[0])
    # Total keys count sanity check
    expected_total = (
        4  # stem
        + block_keys_count
        + downsample_keys_count
        + 6  # norm keys
        + encoder_keys_count
        + decoder_keys_count
    )


def test_large_scale_max_elements():
    # Maximum allowed elements: 1000 blocks, 1000 encoder layers, 1000 decoder layers
    depths = [1000]
    config = Config(backbone_config=BackboneConfig(depths=depths), encoder_layers=1000, decoder_layers=1000)
    codeflash_output = create_rename_keys(config)
    keys = codeflash_output  # 6.96ms -> 3.11ms (124% faster)
    # Block keys: 1 stage * 1000 blocks * 12 keys per block
    block_keys_count = sum(1 for k in keys if ".blocks." in k[0])
    # Downsample keys: only for stage 0
    downsample_keys_count = sum(1 for k in keys if ".downsample." in k[0])
    # Encoder keys: 1000 * 16
    encoder_keys_count = sum(1 for k in keys if "transformer.encoder.layers." in k[0])
    # Decoder keys: 1000 * 18
    decoder_keys_count = sum(1 for k in keys if "transformer.decoder.layers." in k[0])
    # Total keys count sanity check
    expected_total = (
        4  # stem
        + block_keys_count
        + downsample_keys_count
        + 6  # norm keys
        + encoder_keys_count
        + decoder_keys_count
    )


def test_large_scale_zero_everything():
    # All zero: no stages, no encoder, no decoder layers
    config = Config(backbone_config=BackboneConfig(depths=[]), encoder_layers=0, decoder_layers=0)
    codeflash_output = create_rename_keys(config)
    keys = codeflash_output  # 2.33μs -> 2.28μs (2.42% faster)
    # Only stem and norm keys
    expected_keys = [
        (
            "backbone.0.body.patch_embed.proj.weight",
            "model.backbone.model.embeddings.patch_embeddings.projection.weight",
        ),
        ("backbone.0.body.patch_embed.proj.bias", "model.backbone.model.embeddings.patch_embeddings.projection.bias"),
        ("backbone.0.body.patch_embed.norm.weight", "model.backbone.model.embeddings.norm.weight"),
        ("backbone.0.body.patch_embed.norm.bias", "model.backbone.model.embeddings.norm.bias"),
        ("backbone.0.body.norm1.weight", "model.backbone.model.hidden_states_norms.stage2.weight"),
        ("backbone.0.body.norm1.bias", "model.backbone.model.hidden_states_norms.stage2.bias"),
        ("backbone.0.body.norm2.weight", "model.backbone.model.hidden_states_norms.stage3.weight"),
        ("backbone.0.body.norm2.bias", "model.backbone.model.hidden_states_norms.stage3.bias"),
        ("backbone.0.body.norm3.weight", "model.backbone.model.hidden_states_norms.stage4.weight"),
        ("backbone.0.body.norm3.bias", "model.backbone.model.hidden_states_norms.stage4.bias"),
    ]


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-create_rename_keys-miaeme4z and push.

Codeflash Static Badge

The optimized code achieves a **119% speedup** by eliminating redundant list operations and reducing attribute lookups in nested loops. 

**Key optimizations applied:**

1. **Batch operations with `extend()` instead of individual `append()` calls**: The original code made thousands of individual `list.append()` calls, each requiring list resizing and memory allocation. The optimized version groups related keys into lists and uses `extend()` to add them in batches, reducing the overhead from O(n) individual operations to O(1) batch operations per group.

2. **Cached attribute lookups**: The original code repeatedly accessed `config.backbone_config.depths[i]` and `config.encoder_layers/decoder_layers` within loops. The optimized version caches these values (`depths = config.backbone_config.depths`, `depth_i = depths[i]`, etc.) to eliminate redundant attribute lookups.

3. **String prefix caching**: In tight loops that generate many f-strings with the same prefixes, the optimized code pre-calculates common string prefixes (`src_prefix`, `tgt_prefix`) and reuses them, reducing string formatting overhead.

**Why this leads to speedup:**
- List `append()` operations have overhead for bounds checking, potential resizing, and individual memory allocations
- Python attribute lookups (`obj.attr.subattr`) traverse the object hierarchy each time
- String formatting with f-strings has computational cost that multiplies in loops

**Performance characteristics from test results:**
- **Small configs** (1-2 stages): 30-60% speedup due to reduced overhead
- **Medium configs** (3-10 stages): 50-85% speedup as batch operations become more beneficial  
- **Large configs** (100+ blocks/layers): 95-150% speedup where the optimization impact is most pronounced

The optimization is particularly effective for this function because it processes hundreds to thousands of key mappings in nested loops, making the reduction in per-operation overhead highly impactful.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 22, 2025 14:48
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant