Loadgen concurrent load type #263

changminbark · 2025-10-30T04:22:50Z

PR Template

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test

/kind feature

/kind flake

What this PR does / why we need it:
This PRs introduces a way of producing constant load for concurrency per stage. This is needed to understand how the system performs under constant load. This is achieved by capping the max concurrency of the workers for every stage to achieve the desired level of concurrency.

Which issue(s) this PR fixes:

Fixes #252

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

The load generator now has the option to generate constant load for a specific level of concurrency in each stage (workers with specific max concurrency values to achieve the level of concurrency for each stage). Graphs of the metrics against the level of concurrency are also generated.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Testing

Testing was done using the config.yml file shown below and the necessary services (like vLLM serving HuggingFaceTB/SmolLM2-135M-Instruct and local prometheus).

Click to expand functional test output

config.yaml

stage_0_lifecycle_metrics.json
stage_1_lifecycle_metrics.json
summary_lifecycle_metrics.json
summary_prometheus_metrics.json

changminbark · 2025-10-30T04:23:53Z

/assign @achandrasekar

jjk-g

Thank you for adding this!

inference_perf/loadgen/load_generator.py

inference_perf/utils/custom_tokenizer.py

docs/config.md

inference_perf/main.py

changminbark · 2025-10-30T20:58:24Z

Latest Test:

Validation test for loadgen config:

Misconfigured Yaml

load:
  type: constant
  stages:
  - rate: 50.0
    duration: 1
    num_requests: 50
    concurrency_level: 6
  - rate: 25.0
    duration: 1
    num_requests: 25
    concurrency_level: 2
api: 
  type: completion
  streaming: true
server:
  type: vllm
  model_name: HuggingFaceTB/SmolLM2-135M-Instruct
  base_url: http://0.0.0.0:8000
  ignore_eos: true
tokenizer:
  pretrained_model_name_or_path: HuggingFaceTB/SmolLM2-135M-Instruct
data:
  type: shareGPT
metrics:
  type: prometheus
  prometheus:
    url: http://localhost:9090
    scrape_interval: 15
report:
  request_lifecycle:
    summary: true
    per_stage: true
    per_request: false
  prometheus:
    summary: true
    per_stage: false

python3 inference_perf/main.py -c config.yml
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
2025-10-30 14:48:15,299 - inference_perf.config - INFO - Using configuration from: config.yml
Traceback (most recent call last):
  File "/home/chang-min/Desktop/OpenSource/k8s/inference-perf/inference_perf/main.py", line 332, in <module>
    main_cli()
  File "/home/chang-min/Desktop/OpenSource/k8s/inference-perf/inference_perf/main.py", line 118, in main_cli
    config = read_config(args.config_file)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chang-min/Desktop/OpenSource/k8s/inference-perf/inference_perf/config.py", line 298, in read_config
    converted_stages.append(StandardLoadStage(**stage))
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chang-min/Desktop/OpenSource/k8s/inference-perf/venv/lib/python3.12/site-packages/pydantic/main.py", line 253, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 2 validation errors for StandardLoadStage
num_requests
  Input should be None [type=none_required, input_value=50, input_type=int]
    For further information visit https://errors.pydantic.dev/2.11/v/none_required
concurrency_level
  Input should be None [type=none_required, input_value=6, input_type=int]
    For further information visit https://errors.pydantic.dev/2.11/v/none_required

Functional test (running inference)

config.yaml

stage_0_lifecycle_metrics.json
stage_1_lifecycle_metrics.json
summary_lifecycle_metrics.json
summary_prometheus_metrics.json

k8s-ci-robot · 2025-10-30T20:58:35Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: changminbark
Once this PR has been reviewed and has the lgtm label, please ask for approval from achandrasekar. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

changminbark added 4 commits October 29, 2025 20:12

feat: concurrent load gen; fix: tokenizer max token length

4c3ac23

feat: concurrency metrics graph generation

f7dea57

fix: linting issues

0167ee6

chore: updated doc and reset config.yml

e8d4e50

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 30, 2025

k8s-ci-robot requested review from Bslabe123 and jjk-g October 30, 2025 04:22

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 30, 2025

k8s-ci-robot assigned achandrasekar Oct 30, 2025

jjk-g reviewed Oct 30, 2025

View reviewed changes

fix: removed unnecessary code, improved readability, updated docs

d807819

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Loadgen concurrent load type #263

Loadgen concurrent load type #263

changminbark commented Oct 30, 2025

Uh oh!

changminbark commented Oct 30, 2025

Uh oh!

jjk-g left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

changminbark commented Oct 30, 2025

Uh oh!

k8s-ci-robot commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Loadgen concurrent load type #263

Are you sure you want to change the base?

Loadgen concurrent load type #263

Conversation

changminbark commented Oct 30, 2025

PR Template

Testing

Uh oh!

changminbark commented Oct 30, 2025

Uh oh!

jjk-g left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

changminbark commented Oct 30, 2025

Latest Test:

Validation test for loadgen config:

Functional test (running inference)

Uh oh!

k8s-ci-robot commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants