[Question]: When using vLLM to run Qwen3-32B, the model's performance has severely degraded.

### Self Checks

- [x] I have searched for existing issues [search for existing issues](https://github.com/infiniflow/ragflow/issues), including closed ones.
- [x] I confirm that I am using English to submit this report ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
- [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
- [x] Please do not modify this template :) and fill in all the required fields.

### Describe your problem

I'm using vLLM to run the Qwen3-32B model, but its performance has severely degraded. During conversations, the model fails to correctly return tokens like [ID:XX] in its responses, and its output quality is significantly worse than when using Ollama to run Qwen3-8B. Are there any specific parameters I need to configure when running Qwen3 with vLLM?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question]: When using vLLM to run Qwen3-32B, the model's performance has severely degraded. #10879

Self Checks

Describe your problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question]: When using vLLM to run Qwen3-32B, the model's performance has severely degraded. #10879

Description

Self Checks

Describe your problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions