From 1689bf039699d6968abb6bef2f3bea7f071a7f2f Mon Sep 17 00:00:00 2001
From: Hrithik Sagar <43140053+hrithiksagar@users.noreply.github.com>
Date: Tue, 7 Oct 2025 16:01:15 +0530
Subject: [PATCH] Update README.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fix: Resolved critical bugs in vLLM Online and Offline inference.

Online Inference: Updated installation instructions in the requirements and README. The previous setup referenced an outdated prebuilt vLLM version. The new stable release changed the installation method, which is now correctly documented.

Offline Inference: Fixed a breaking change in llm.generate(<parameters>) due to deprecation in vLLM ≥ 0.10.2. Replaced the old input handling with the new TokensPrompt interface (from vllm.inputs import TokensPrompt) to ensure compatibility with the latest vLLM API.
---
 README.md | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 0104cec4..3ad2a308 100644
--- a/README.md
+++ b/README.md
@@ -89,6 +89,21 @@ uv pip install --pre vllm==0.10.1+gptoss \
 vllm serve openai/gpt-oss-20b
 ```
 
+In case if the above installation did not work, these will work for Online Inference
+```
+sudo add-apt-repository ppa:deadsnakes/ppa -y
+sudo apt update
+sudo apt install python3.12 python3.12-venv python3.12-dev -y
+python3.12 --version
+python3.12 -m venv .oss
+source .oss/bin/activate
+pip install -U uv
+uv pip install vllm==0.10.2 --torch-backend=auto
+# uv pip install openai-harmony # This is optional for Online Serve but required for offline serve 
+# main copmmand to start the Online Inference Server
+vllm serve openai/gpt-oss-20b --async-scheduling 
+```
+
 [Learn more about how to use gpt-oss with vLLM.](https://cookbook.openai.com/articles/gpt-oss/run-vllm)
 
 Offline Serve Code:
@@ -150,7 +165,7 @@ sampling = SamplingParams(
 )
  
 outputs = llm.generate(
-    prompt_token_ids=[prefill_ids],   # batch of size 1
+    [TokensPrompt(prompt_token_ids=prefill_ids)],
     sampling_params=sampling,
 )