- 
                Notifications
    You must be signed in to change notification settings 
- Fork 45
Loadgen concurrent load type #263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Loadgen concurrent load type #263
Conversation
| /assign @achandrasekar | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding this!
| Latest Test:Validation test for loadgen config:Misconfigured Yaml load:
  type: constant
  stages:
  - rate: 50.0
    duration: 1
    num_requests: 50
    concurrency_level: 6
  - rate: 25.0
    duration: 1
    num_requests: 25
    concurrency_level: 2
api: 
  type: completion
  streaming: true
server:
  type: vllm
  model_name: HuggingFaceTB/SmolLM2-135M-Instruct
  base_url: http://0.0.0.0:8000
  ignore_eos: true
tokenizer:
  pretrained_model_name_or_path: HuggingFaceTB/SmolLM2-135M-Instruct
data:
  type: shareGPT
metrics:
  type: prometheus
  prometheus:
    url: http://localhost:9090
    scrape_interval: 15
report:
  request_lifecycle:
    summary: true
    per_stage: true
    per_request: false
  prometheus:
    summary: true
    per_stage: falsepython3 inference_perf/main.py -c config.yml
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
2025-10-30 14:48:15,299 - inference_perf.config - INFO - Using configuration from: config.yml
Traceback (most recent call last):
  File "/home/chang-min/Desktop/OpenSource/k8s/inference-perf/inference_perf/main.py", line 332, in <module>
    main_cli()
  File "/home/chang-min/Desktop/OpenSource/k8s/inference-perf/inference_perf/main.py", line 118, in main_cli
    config = read_config(args.config_file)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chang-min/Desktop/OpenSource/k8s/inference-perf/inference_perf/config.py", line 298, in read_config
    converted_stages.append(StandardLoadStage(**stage))
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chang-min/Desktop/OpenSource/k8s/inference-perf/venv/lib/python3.12/site-packages/pydantic/main.py", line 253, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 2 validation errors for StandardLoadStage
num_requests
  Input should be None [type=none_required, input_value=50, input_type=int]
    For further information visit https://errors.pydantic.dev/2.11/v/none_required
concurrency_level
  Input should be None [type=none_required, input_value=6, input_type=int]
    For further information visit https://errors.pydantic.dev/2.11/v/none_requiredFunctional test (running inference)stage_0_lifecycle_metrics.json       | 
| [APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: changminbark The full list of commands accepted by this bot can be found here. 
Needs approval from an approver in each of these files:
 Approvers can indicate their approval by writing  | 
PR Template
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PRs introduces a way of producing constant load for concurrency per stage. This is needed to understand how the system performs under constant load. This is achieved by capping the max concurrency of the workers for every stage to achieve the desired level of concurrency.
Which issue(s) this PR fixes:
Fixes #252
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
Testing
Testing was done using the config.yml file shown below and the necessary services (like vLLM serving HuggingFaceTB/SmolLM2-135M-Instruct and local prometheus).
Click to expand functional test output
config.yaml
stage_0_lifecycle_metrics.json
stage_1_lifecycle_metrics.json
summary_lifecycle_metrics.json
summary_prometheus_metrics.json