Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 22, 2025

📄 117% (1.17x) speedup for get_optimal_tiled_canvas in src/transformers/models/got_ocr2/image_processing_got_ocr2.py

⏱️ Runtime : 129 milliseconds 59.5 milliseconds (best of 76 runs)

📝 Explanation and details

The optimization achieves a 117% speedup by fundamentally changing the algorithm in get_all_supported_aspect_ratios from a brute-force nested loop approach to a more efficient factorization-based method.

Key Optimization - Algorithm Change:

  • Original: Nested loops testing all width×height combinations (O(max_tiles²) complexity)
  • Optimized: Iterates through tile counts and finds their factor pairs (O(max_tiles × √max_tiles) complexity)

Specific Changes:

  1. Factorization approach: Instead of checking width * height <= max_image_tiles for all combinations, the optimized version iterates through each valid tile count and finds its divisor pairs using modulo operations
  2. Reduced computational complexity: For large max_image_tiles, this dramatically reduces the number of operations
  3. Minor micro-optimizations in get_optimal_tiled_canvas:
    • Pre-computes twice_target_patch_area to avoid repeated multiplication
    • Uses tuple unpacking w, h = grid for cleaner variable access

Performance Impact:
The optimization is particularly effective for large tile ranges, as evidenced by test results showing 2356% speedup for test_min_tiles_equals_max_tiles_large and 125% speedup for large-scale performance tests. The factorization approach scales much better than the quadratic nested loop.

Hot Path Context:
Based on function_references, get_optimal_tiled_canvas is called from crop_image_to_patches and get_number_of_image_patches - critical image processing functions that likely process multiple images in batch operations. This optimization will significantly improve throughput for OCR workloads that process many images with varying tile configurations.

Test Case Benefits:
The optimization excels particularly in scenarios with large max_image_tiles values (500-1000 range), showing 72-125% improvements, while maintaining correctness for all edge cases and basic functionality.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 48 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
# function to test

# imports
import pytest

from transformers.models.got_ocr2.image_processing_got_ocr2 import get_optimal_tiled_canvas


# unit tests

# ---------------------- Basic Test Cases ----------------------


def test_square_image_square_tile_basic():
    # 100x100 image, 50x50 tile, 1-4 tiles allowed
    # Should favor (2,2) grid (closest to aspect ratio 1)
    codeflash_output = get_optimal_tiled_canvas((100, 100), (50, 50), 1, 4)
    result = codeflash_output  # 4.51μs -> 4.41μs (2.29% faster)


def test_rectangular_image_basic():
    # 200x100 image (aspect ratio 2), 50x50 tile, 1-4 tiles allowed
    # Should favor (2,1) grid (aspect ratio 2)
    codeflash_output = get_optimal_tiled_canvas((100, 200), (50, 50), 1, 4)
    result = codeflash_output  # 3.53μs -> 3.36μs (5.00% faster)


def test_rectangular_image_vertical_basic():
    # 100x200 image (aspect ratio 0.5), 50x50 tile, 1-4 tiles allowed
    # Should favor (1,2) grid (aspect ratio 0.5)
    codeflash_output = get_optimal_tiled_canvas((200, 100), (50, 50), 1, 4)
    result = codeflash_output  # 3.37μs -> 3.25μs (3.85% faster)


def test_min_tile_is_max_tile():
    # Only one possible grid (min=max=1)
    codeflash_output = get_optimal_tiled_canvas((100, 100), (50, 50), 1, 1)
    result = codeflash_output  # 2.55μs -> 2.38μs (7.27% faster)


def test_exact_tile_fit():
    # 100x200 image, 50x100 tile, min=2, max=2
    # Only (1,2) and (2,1) are possible, aspect ratio 2 favors (2,1)
    codeflash_output = get_optimal_tiled_canvas((100, 200), (50, 100), 2, 2)
    result = codeflash_output  # 6.19μs -> 6.05μs (2.23% faster)


# ---------------------- Edge Test Cases ----------------------


def test_zero_size_image():
    # Zero dimension image should not crash, but aspect ratio undefined
    with pytest.raises(ZeroDivisionError):
        get_optimal_tiled_canvas((0, 100), (50, 50), 1, 4)  # 1.70μs -> 1.56μs (8.98% faster)


def test_zero_size_tile():
    # Zero dimension tile should not crash, but area comparison will fail
    # Should still select best aspect ratio grid, but area logic will always be false
    codeflash_output = get_optimal_tiled_canvas((100, 100), (0, 50), 1, 4)
    result = codeflash_output  # 4.28μs -> 3.98μs (7.55% faster)


def test_min_tiles_greater_than_max_tiles():
    # No possible grids, should return default (1,1)
    codeflash_output = get_optimal_tiled_canvas((100, 100), (50, 50), 5, 4)
    result = codeflash_output  # 5.50μs -> 3.99μs (37.7% faster)


def test_non_integer_aspect_ratio():
    # 150x100 image, aspect ratio 1.5, tiles 1-4
    # Should favor (3,2) grid if possible, but with max=4 only (2,1) and (1,2) are possible
    codeflash_output = get_optimal_tiled_canvas((100, 150), (50, 50), 1, 4)
    result = codeflash_output  # 4.05μs -> 4.06μs (0.222% slower)


def test_tie_breaker_prefers_more_tiles():
    # 100x100 image, tile 50x50, min=1, max=4
    # Both (2,2) and (1,1) have aspect ratio 1, but (2,2) uses more tiles
    codeflash_output = get_optimal_tiled_canvas((100, 100), (50, 50), 1, 4)
    result = codeflash_output  # 3.78μs -> 3.48μs (8.64% faster)


def test_tie_breaker_area_constraint():
    # Large image, small tile, tie-breaker should NOT favor more tiles if area constraint exceeded
    # area = 10000, tile area = 100, grid (4,4) covers 1600, which is > 2*10000, so should NOT pick (4,4)
    codeflash_output = get_optimal_tiled_canvas((100, 100), (10, 10), 1, 16)
    result = codeflash_output  # 25.0μs -> 20.7μs (20.8% faster)


def test_one_dimension_is_one():
    # 1x100 image, tile 1x50, min=1, max=2
    # Only (1,1) and (1,2) possible, aspect ratio 100 favors (1,1)
    codeflash_output = get_optimal_tiled_canvas((1, 100), (1, 50), 1, 2)
    result = codeflash_output  # 6.60μs -> 6.48μs (1.81% faster)


def test_large_aspect_ratio():
    # 100x1000 image, tile 50x50, min=1, max=20
    # aspect ratio 10, best grid is (10,1)
    codeflash_output = get_optimal_tiled_canvas((100, 1000), (50, 50), 1, 20)
    result = codeflash_output  # 33.6μs -> 24.9μs (35.0% faster)


def test_small_aspect_ratio():
    # 1000x100 image, tile 50x50, min=1, max=20
    # aspect ratio 0.1, best grid is (1,10)
    codeflash_output = get_optimal_tiled_canvas((1000, 100), (50, 50), 1, 20)
    result = codeflash_output  # 8.15μs -> 7.44μs (9.61% faster)


# ---------------------- Large Scale Test Cases ----------------------


def test_large_image_large_tile_range():
    # 500x800 image, 50x50 tile, min=1, max=1000
    # aspect ratio 1.6, best grid should be closest to 1.6, e.g. (40,25) = 1.6
    codeflash_output = get_optimal_tiled_canvas((500, 800), (50, 50), 1, 1000)
    result = codeflash_output  # 473μs -> 434μs (9.02% faster)


def test_large_image_non_square_tile():
    # 600x300 image, 30x60 tile, min=1, max=500
    # aspect ratio 0.5, best grid is (10,20) = 0.5
    codeflash_output = get_optimal_tiled_canvas((600, 300), (30, 60), 1, 500)
    result = codeflash_output  # 7.66ms -> 3.55ms (116% faster)


def test_max_tiles_limit():
    # 1000x1000 image, 10x10 tile, min=1, max=1000
    # aspect ratio 1, best grid is (31,32) or (32,31) (since 32*31=992 < 1000)
    codeflash_output = get_optimal_tiled_canvas((1000, 1000), (10, 10), 1, 1000)
    result = codeflash_output  # 472μs -> 440μs (7.23% faster)


def test_large_image_small_tile():
    # 800x600 image, 8x6 tile, min=10, max=1000
    # aspect ratio 1.333..., best grid is (32,24) = 1.333...
    codeflash_output = get_optimal_tiled_canvas((800, 600), (8, 6), 10, 1000)
    result = codeflash_output  # 30.6ms -> 13.9ms (120% faster)


def test_large_tile_small_image():
    # 100x100 image, 100x100 tile, min=1, max=1
    codeflash_output = get_optimal_tiled_canvas((100, 100), (100, 100), 1, 1)
    result = codeflash_output  # 2.90μs -> 2.98μs (2.98% slower)


def test_large_scale_performance():
    # Test that function returns quickly and correctly for large max_image_tiles
    codeflash_output = get_optimal_tiled_canvas((999, 999), (33, 33), 1, 999)
    result = codeflash_output  # 30.8ms -> 13.7ms (125% faster)


# ---------------------- Special/Corner Cases ----------------------


def test_min_tiles_zero():
    # Zero min tiles should return (1,1) as only possible grid
    codeflash_output = get_optimal_tiled_canvas((100, 100), (50, 50), 0, 4)
    result = codeflash_output  # 10.8μs -> 10.4μs (4.21% faster)


def test_max_tiles_zero():
    # Zero max tiles means no possible grid, should return (1,1)
    codeflash_output = get_optimal_tiled_canvas((100, 100), (50, 50), 1, 0)
    result = codeflash_output  # 4.14μs -> 3.94μs (5.16% faster)


def test_min_tiles_equals_max_tiles_large():
    # Only one possible grid, e.g. min=max=100
    codeflash_output = get_optimal_tiled_canvas((1000, 1000), (10, 10), 100, 100)
    result = codeflash_output  # 269μs -> 11.0μs (2356% faster)


# ---------------------- Input Validation (for mutation robustness) ----------------------


@pytest.mark.parametrize(
    "orig_size,tile_size,min_tiles,max_tiles",
    [
        ((100, 100), (50, 50), 1, 4),
        ((200, 100), (50, 50), 1, 4),
        ((100, 200), (50, 50), 1, 4),
        ((1000, 1000), (10, 10), 1, 1000),
    ],
)
def test_mutation_resistance(orig_size, tile_size, min_tiles, max_tiles):
    # Changing the logic should break at least one of these
    codeflash_output = get_optimal_tiled_canvas(orig_size, tile_size, min_tiles, max_tiles)
    result = codeflash_output  # 491μs -> 450μs (9.18% faster)


# ---------------------- Defensive Programming for ValueError ----------------------


def _raise_on_negative_args(*args):
    if any(isinstance(x, int) and x < 0 for x in args):
        raise ValueError("Negative values not allowed.")


get_optimal_tiled_canvas_orig = get_optimal_tiled_canvas


def get_optimal_tiled_canvas_validated(
    original_image_size: tuple[int, int],
    target_tile_size: tuple[int, int],
    min_image_tiles: int,
    max_image_tiles: int,
) -> tuple[int, int]:
    _raise_on_negative_args(*original_image_size, *target_tile_size, min_image_tiles, max_image_tiles)
    return get_optimal_tiled_canvas_orig(original_image_size, target_tile_size, min_image_tiles, max_image_tiles)


# Use the validated function for input validation tests
def test_validated_negative_dimensions():
    with pytest.raises(ValueError):
        get_optimal_tiled_canvas_validated((-100, 100), (50, 50), 1, 4)


def test_validated_negative_tile_size():
    with pytest.raises(ValueError):
        get_optimal_tiled_canvas_validated((100, 100), (-50, 50), 1, 4)


def test_validated_negative_min_max_tiles():
    with pytest.raises(ValueError):
        get_optimal_tiled_canvas_validated((100, 100), (50, 50), -1, 4)
    with pytest.raises(ValueError):
        get_optimal_tiled_canvas_validated((100, 100), (50, 50), 1, -4)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
# imports
import pytest

from transformers.models.got_ocr2.image_processing_got_ocr2 import get_optimal_tiled_canvas


# unit tests

# BASIC TEST CASES


def test_square_image_square_tile_exact_fit():
    # 2x2 image, 1x1 tile, min/max tiles = 1/4, expect (2,2) grid (perfect fit)
    codeflash_output = get_optimal_tiled_canvas((2, 2), (1, 1), 1, 4)
    result = codeflash_output  # 4.42μs -> 4.39μs (0.707% faster)


def test_rectangular_image_wider_than_tall():
    # 4x2 image, 1x1 tile, min/max tiles = 1/8, expect (4,2) grid (best aspect match)
    codeflash_output = get_optimal_tiled_canvas((2, 4), (1, 1), 1, 8)
    result = codeflash_output  # 44.9μs -> 40.4μs (11.1% faster)


def test_rectangular_image_taller_than_wide():
    # 2x4 image, 1x1 tile, min/max tiles = 1/8, expect (2,4) grid (best aspect match)
    codeflash_output = get_optimal_tiled_canvas((4, 2), (1, 1), 1, 8)
    result = codeflash_output  # 4.77μs -> 4.63μs (2.94% faster)


def test_tile_size_larger_than_image():
    # 2x2 image, 4x4 tile, min/max tiles = 1/1, only possible grid is (1,1)
    codeflash_output = get_optimal_tiled_canvas((2, 2), (4, 4), 1, 1)
    result = codeflash_output  # 2.44μs -> 2.33μs (4.80% faster)


def test_multiple_possible_grids_choose_best_aspect():
    # 3x6 image, 1x1 tile, min/max tiles = 1/12, expect (2,4) grid (aspect 2/4=0.5 vs 6/3=2)
    codeflash_output = get_optimal_tiled_canvas((3, 6), (1, 1), 1, 12)
    result = codeflash_output  # 110μs -> 92.2μs (19.9% faster)


def test_min_tiles_equals_max_tiles():
    # 3x3 image, 1x1 tile, min/max tiles = 9/9, only possible grid is (3,3)
    codeflash_output = get_optimal_tiled_canvas((3, 3), (1, 1), 9, 9)
    result = codeflash_output  # 112μs -> 79.6μs (41.2% faster)


def test_aspect_ratio_tiebreaker_more_tiles():
    # 2x2 image, 1x1 tile, min/max tiles = 1/4
    # (1,2) and (2,1) both have aspect diff 1.0 from image aspect 1.0, but (2,2) is perfect
    codeflash_output = get_optimal_tiled_canvas((2, 2), (1, 1), 1, 4)
    result = codeflash_output  # 4.13μs -> 3.95μs (4.63% faster)


def test_aspect_ratio_tiebreaker_favor_more_tiles_until_area_limit():
    # 2x2 image, 1x1 tile, min/max tiles = 1/4
    # (1,2) and (2,1) both have aspect diff 1.0 from image aspect 1.0, but (2,2) is perfect
    codeflash_output = get_optimal_tiled_canvas((2, 2), (1, 1), 1, 4)
    result = codeflash_output  # 3.71μs -> 3.54μs (4.66% faster)


# EDGE TEST CASES


def test_zero_image_size_raises():
    # Should raise ZeroDivisionError due to division by zero in aspect ratio calculation
    with pytest.raises(ZeroDivisionError):
        get_optimal_tiled_canvas((0, 0), (1, 1), 1, 4)  # 1.99μs -> 1.96μs (1.84% faster)


def test_non_integer_image_and_tile_sizes():
    # Should work with floats, as division is allowed
    codeflash_output = get_optimal_tiled_canvas((2.5, 2.5), (1.0, 1.0), 1, 4)
    result = codeflash_output  # 4.97μs -> 4.81μs (3.22% faster)


def test_large_aspect_ratio_image():
    # Very wide image, expect grid to be as wide as possible within max tiles
    codeflash_output = get_optimal_tiled_canvas((1, 100), (1, 1), 1, 10)
    result = codeflash_output  # 15.5μs -> 13.8μs (12.4% faster)


def test_large_aspect_ratio_tile():
    # Tile is much wider than tall, but image is square
    codeflash_output = get_optimal_tiled_canvas((10, 10), (1, 10), 1, 10)
    result = codeflash_output  # 5.57μs -> 4.97μs (11.9% faster)


# LARGE SCALE TEST CASES


def test_large_image_and_tile_grid():
    # 100x100 image, 10x10 tile, min/max tiles = 1/100
    codeflash_output = get_optimal_tiled_canvas((100, 100), (10, 10), 1, 100)
    result = codeflash_output  # 371μs -> 215μs (72.0% faster)


def test_large_number_of_possible_grids():
    # 30x40 image, 1x1 tile, min/max tiles = 1/900
    codeflash_output = get_optimal_tiled_canvas((30, 40), (1, 1), 1, 900)
    result = codeflash_output  # 24.5ms -> 11.2ms (119% faster)


def test_large_grid_with_aspect_ratio_tiebreak():
    # 50x50 image, 1x1 tile, min/max tiles = 1/1000
    codeflash_output = get_optimal_tiled_canvas((50, 50), (1, 1), 1, 1000)
    result = codeflash_output  # 479μs -> 440μs (8.98% faster)


def test_large_grid_area_limit_tiebreak():
    # 10x10 image, 1x1 tile, min/max tiles = 1/200
    # Aspect ratio 1.0, so grids like (10,10), (5,5), etc. are possible
    # Should select (10,10)
    codeflash_output = get_optimal_tiled_canvas((10, 10), (1, 1), 1, 200)
    result = codeflash_output  # 1.25ms -> 676μs (85.0% faster)


def test_large_rectangular_image():
    # 100x50 image, 1x1 tile, min/max tiles = 1/1000
    codeflash_output = get_optimal_tiled_canvas((100, 50), (1, 1), 1, 1000)
    result = codeflash_output  # 480μs -> 442μs (8.55% faster)


def test_performance_large_possible_grids():
    # This test is to ensure function does not hang on large input
    codeflash_output = get_optimal_tiled_canvas((999, 999), (1, 1), 1, 999)
    result = codeflash_output  # 31.0ms -> 13.7ms (125% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-get_optimal_tiled_canvas-miafdgve and push.

Codeflash Static Badge

The optimization achieves a **117% speedup** by fundamentally changing the algorithm in `get_all_supported_aspect_ratios` from a brute-force nested loop approach to a more efficient factorization-based method.

**Key Optimization - Algorithm Change:**
- **Original**: Nested loops testing all width×height combinations (O(max_tiles²) complexity)
- **Optimized**: Iterates through tile counts and finds their factor pairs (O(max_tiles × √max_tiles) complexity)

**Specific Changes:**
1. **Factorization approach**: Instead of checking `width * height <= max_image_tiles` for all combinations, the optimized version iterates through each valid tile count and finds its divisor pairs using modulo operations
2. **Reduced computational complexity**: For large `max_image_tiles`, this dramatically reduces the number of operations
3. **Minor micro-optimizations** in `get_optimal_tiled_canvas`:
   - Pre-computes `twice_target_patch_area` to avoid repeated multiplication
   - Uses tuple unpacking `w, h = grid` for cleaner variable access

**Performance Impact:**
The optimization is particularly effective for large tile ranges, as evidenced by test results showing **2356% speedup** for `test_min_tiles_equals_max_tiles_large` and **125% speedup** for large-scale performance tests. The factorization approach scales much better than the quadratic nested loop.

**Hot Path Context:**
Based on `function_references`, `get_optimal_tiled_canvas` is called from `crop_image_to_patches` and `get_number_of_image_patches` - critical image processing functions that likely process multiple images in batch operations. This optimization will significantly improve throughput for OCR workloads that process many images with varying tile configurations.

**Test Case Benefits:**
The optimization excels particularly in scenarios with large `max_image_tiles` values (500-1000 range), showing 72-125% improvements, while maintaining correctness for all edge cases and basic functionality.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 22, 2025 15:09
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant