Skip to content

Conversation

@roomote
Copy link
Contributor

@roomote roomote bot commented Nov 12, 2025

Description

This PR attempts to address Issue #9188 where the Qwen3-Coder-30B-A3B model with OpenAI Compatible API was experiencing HTTP 400 errors, particularly after multiple rounds of conversation.

Problem

Users were experiencing HTTP 400 errors when:

  • Reopening conversations with the Qwen3-Coder-30B-A3B model
  • Having multiple rounds of conversation
  • The conversation history became too long for the model's context window

Solution

Implemented automatic retry logic with progressive conversation history truncation:

  • Detects HTTP 400 errors and automatically retries the request
  • Progressively truncates older conversation messages while keeping at least the 10 most recent messages for context
  • Supports up to 3 retry attempts with increasing truncation ratios (50%, 60%, 70%, up to 80%)
  • Applied to both streaming and non-streaming API calls

Changes

  • Modified BaseOpenAiCompatibleProvider to add retry logic in createStream() method
  • Added retry logic to completePrompt() method for non-streaming calls
  • Added comprehensive test coverage for all retry scenarios

Testing

  • Added new test suite base-openai-compatible-provider-retry.spec.ts with 7 test cases
  • All existing tests continue to pass
  • Tests verify:
    • Retry with truncation on HTTP 400 errors
    • Progressive truncation on multiple retries
    • No retry on non-400 errors
    • Minimum message preservation (10 messages)
    • Proper handling of short conversations

Impact

This fix should help users experiencing HTTP 400 errors with OpenAI Compatible APIs, especially when using models with limited context windows or when having long conversation histories.

Feedback and guidance are welcome!


Important

Adds retry logic with progressive truncation for HTTP 400 errors in OpenAI Compatible API, with tests for streaming and non-streaming calls.

  • Behavior:
    • Adds retry logic for HTTP 400 errors in createStream() and completePrompt() in base-openai-compatible-provider.ts.
    • Truncates conversation history or prompt progressively on retries, keeping at least 10 messages or reducing prompt length.
    • Supports up to 3 retries with truncation ratios of 50%, 60%, 70%, up to 80%.
    • Applies to both streaming and non-streaming API calls.
  • Testing:
    • New test suite base-openai-compatible-provider-retry.spec.ts with 7 test cases.
    • Tests cover retry logic, progressive truncation, non-retry on non-400 errors, and minimum message preservation.
  • Misc:
    • Logs warnings on retries and truncations in base-openai-compatible-provider.ts.

This description was created by Ellipsis for 669b812. You can customize this summary. It will automatically update as commits are pushed.

… in OpenAI Compatible API

- Implement automatic retry with progressive conversation history truncation when HTTP 400 errors occur
- Add retry logic to both streaming and non-streaming methods
- Truncate older messages while keeping at least 10 most recent for context
- Add comprehensive test coverage for retry scenarios
- Fixes issue #9188 where Qwen3-Coder-30B-A3B model was prone to HTTP 400 errors after multiple conversation rounds
@roomote roomote bot requested a review from mrubens as a code owner November 12, 2025 05:58
@roomote roomote bot requested review from cte and jr as code owners November 12, 2025 05:58
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Nov 12, 2025
@roomote
Copy link
Contributor Author

roomote bot commented Nov 12, 2025

Rooviewer Clock   See task on Roo Cloud

Review complete. Found 2 issues that should be addressed:

  • Fix recursive call to pass truncated messages instead of original messages
  • Fix prompt truncation to keep end of prompt instead of beginning

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

console.warn(
`[${this.providerName}] Received HTTP 400 error, retrying with truncated conversation history (attempt ${retryCount + 1}/3)`,
)
return this.createStream(systemPrompt, messages, metadata, requestOptions, retryCount + 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry logic passes the original messages parameter instead of the truncated convertedMessages, causing the truncation logic to re-execute on the same original messages each retry. This means previous truncations are lost and the same truncation ratio is applied to the full original message list each time, rather than progressively truncating. This could lead to inefficient retries where the message count doesn't decrease as expected across multiple attempts.

Suggested change
return this.createStream(systemPrompt, messages, metadata, requestOptions, retryCount + 1)
return this.createStream(systemPrompt, convertedMessages, metadata, requestOptions, retryCount + 1)

Fix it with Roo Code or mention @roomote and request a fix.

// If this is a retry and we have many messages, try truncating older conversation history
// Keep at least the last 10 messages to maintain context
if (retryCount > 0 && convertedMessages.length > 10) {
const truncationRatio = Math.min(0.5 + retryCount * 0.1, 0.8) // Truncate 50%, 60%, 70%, up to 80%
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The truncation ratio is computed inline using magic numbers (0.5, 0.1, 0.8). Consider extracting these values as named constants for improved readability and easier tuning in the future.

// Retry with truncated prompt if we haven't exceeded max retries
if (is400Error && retryCount < 3 && prompt.length > 1000) {
const truncationRatio = Math.min(0.5 + retryCount * 0.1, 0.8)
const truncatedPrompt = prompt.substring(0, Math.floor(prompt.length * (1 - truncationRatio)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The truncation keeps the beginning of the prompt and discards the end, which loses the most recent and relevant context. For prompts containing conversation history or instructions at the end, this approach removes critical information needed for the model to respond appropriately. This is inconsistent with the createStream method which keeps the most recent messages. Consider keeping the end of the prompt instead to preserve recent context.

Suggested change
const truncatedPrompt = prompt.substring(0, Math.floor(prompt.length * (1 - truncationRatio)))
const truncatedPrompt = prompt.substring(Math.floor(prompt.length * truncationRatio))

Fix it with Roo Code or mention @roomote and request a fix.

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Status: Triage

Development

Successfully merging this pull request may close these issues.

3 participants