Skip to content

[Question]: When using vLLM to run Qwen3-32B, the model's performance has severely degraded. #10879

@husl616

Description

@husl616

Self Checks

  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (Language Policy).
  • Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
  • Please do not modify this template :) and fill in all the required fields.

Describe your problem

I'm using vLLM to run the Qwen3-32B model, but its performance has severely degraded. During conversations, the model fails to correctly return tokens like [ID:XX] in its responses, and its output quality is significantly worse than when using Ollama to run Qwen3-8B. Are there any specific parameters I need to configure when running Qwen3 with vLLM?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions