Self Checks
Describe your problem
I'm using vLLM to run the Qwen3-32B model, but its performance has severely degraded. During conversations, the model fails to correctly return tokens like [ID:XX] in its responses, and its output quality is significantly worse than when using Ollama to run Qwen3-8B. Are there any specific parameters I need to configure when running Qwen3 with vLLM?