-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Fix: Add retry logic for HTTP 400 errors in OpenAI Compatible API #9189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… in OpenAI Compatible API - Implement automatic retry with progressive conversation history truncation when HTTP 400 errors occur - Add retry logic to both streaming and non-streaming methods - Truncate older messages while keeping at least 10 most recent for context - Add comprehensive test coverage for retry scenarios - Fixes issue #9188 where Qwen3-Coder-30B-A3B model was prone to HTTP 400 errors after multiple conversation rounds
Review complete. Found 2 issues that should be addressed:
Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues. |
| console.warn( | ||
| `[${this.providerName}] Received HTTP 400 error, retrying with truncated conversation history (attempt ${retryCount + 1}/3)`, | ||
| ) | ||
| return this.createStream(systemPrompt, messages, metadata, requestOptions, retryCount + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The retry logic passes the original messages parameter instead of the truncated convertedMessages, causing the truncation logic to re-execute on the same original messages each retry. This means previous truncations are lost and the same truncation ratio is applied to the full original message list each time, rather than progressively truncating. This could lead to inefficient retries where the message count doesn't decrease as expected across multiple attempts.
| return this.createStream(systemPrompt, messages, metadata, requestOptions, retryCount + 1) | |
| return this.createStream(systemPrompt, convertedMessages, metadata, requestOptions, retryCount + 1) |
Fix it with Roo Code or mention @roomote and request a fix.
| // If this is a retry and we have many messages, try truncating older conversation history | ||
| // Keep at least the last 10 messages to maintain context | ||
| if (retryCount > 0 && convertedMessages.length > 10) { | ||
| const truncationRatio = Math.min(0.5 + retryCount * 0.1, 0.8) // Truncate 50%, 60%, 70%, up to 80% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The truncation ratio is computed inline using magic numbers (0.5, 0.1, 0.8). Consider extracting these values as named constants for improved readability and easier tuning in the future.
| // Retry with truncated prompt if we haven't exceeded max retries | ||
| if (is400Error && retryCount < 3 && prompt.length > 1000) { | ||
| const truncationRatio = Math.min(0.5 + retryCount * 0.1, 0.8) | ||
| const truncatedPrompt = prompt.substring(0, Math.floor(prompt.length * (1 - truncationRatio))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The truncation keeps the beginning of the prompt and discards the end, which loses the most recent and relevant context. For prompts containing conversation history or instructions at the end, this approach removes critical information needed for the model to respond appropriately. This is inconsistent with the createStream method which keeps the most recent messages. Consider keeping the end of the prompt instead to preserve recent context.
| const truncatedPrompt = prompt.substring(0, Math.floor(prompt.length * (1 - truncationRatio))) | |
| const truncatedPrompt = prompt.substring(Math.floor(prompt.length * truncationRatio)) |
Fix it with Roo Code or mention @roomote and request a fix.
Description
This PR attempts to address Issue #9188 where the Qwen3-Coder-30B-A3B model with OpenAI Compatible API was experiencing HTTP 400 errors, particularly after multiple rounds of conversation.
Problem
Users were experiencing HTTP 400 errors when:
Solution
Implemented automatic retry logic with progressive conversation history truncation:
Changes
BaseOpenAiCompatibleProviderto add retry logic increateStream()methodcompletePrompt()method for non-streaming callsTesting
base-openai-compatible-provider-retry.spec.tswith 7 test casesImpact
This fix should help users experiencing HTTP 400 errors with OpenAI Compatible APIs, especially when using models with limited context windows or when having long conversation histories.
Feedback and guidance are welcome!
Important
Adds retry logic with progressive truncation for HTTP 400 errors in OpenAI Compatible API, with tests for streaming and non-streaming calls.
createStream()andcompletePrompt()inbase-openai-compatible-provider.ts.base-openai-compatible-provider-retry.spec.tswith 7 test cases.base-openai-compatible-provider.ts.This description was created by
for 669b812. You can customize this summary. It will automatically update as commits are pushed.