Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions cpp/tensorrt_llm/batch_manager/evictionPolicy.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -174,17 +174,21 @@ void LRUEvictionPolicy::claimBlock(BlockPtr block, std::optional<executor::Reten
{
mFreeQueues[cacheLevel][getPriorityIdx(block->getPriority())].erase(*mFreeBlockIterators[id]);
mNumFreeBlocksPerLevel[cacheLevel] -= 1;
mFreeBlockIterators[id] = std::nullopt;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change necessary for the bug? Looks like a non-functional change to me.

}

mFreeBlockIterators[id] = std::nullopt;

if (priority.has_value())
{
block->setPriority(*priority);
}

// It is safe to call erase even if block is not found in mExpiringBlockHeap.
mExpiringBlockHeap.erase(block);
block->setDurationMs(durationMs);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. Is this change necessary for the bug? Looks like a non-functional change to me.

if (durationMs.has_value())
{
block->setDurationMs(durationMs);
}
Comment on lines +188 to +191
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Duration must be reset when no value is provided

With this change a caller can no longer clear a block’s duration by passing std::nullopt; the old value survives the claim. The second claimBlock call in the reuse paths now leaves whatever retention duration the block had previously, so blocks can continue to expire unexpectedly under the wrong policy. Please restore the ability to reset the optional.

-    if (durationMs.has_value())
-    {
-        block->setDurationMs(durationMs);
-    }
+    block->setDurationMs(durationMs);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (durationMs.has_value())
{
block->setDurationMs(durationMs);
}
block->setDurationMs(durationMs);
🤖 Prompt for AI Agents
In cpp/tensorrt_llm/batch_manager/evictionPolicy.cpp around lines 188 to 191,
the current logic only sets block->setDurationMs(durationMs) when
durationMs.has_value(), which prevents callers from clearing a previously set
duration; change the branch so that when durationMs has a value you set it on
the block, and when it does not you explicitly clear/reset the block’s duration
(e.g., call the block’s reset/clear-duration API or set its optional duration to
std::nullopt) so an absence of duration truly removes the prior retention.

}

std::chrono::steady_clock::time_point::duration LRUEvictionPolicy::getTime() const
Expand Down
16 changes: 16 additions & 0 deletions cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1225,6 +1225,13 @@ SizeType32 WindowBlockManager::loadOrAllocateBlocks(std::vector<BlockKey> const&
: std::make_tuple(false, 0, nullptr);
if (matchingBlock != nullptr)
{
// Remove matchingBlock from free queue so it does not get overwritten accidentally.
// If block is not in free queue, calling claimBlock is a noop.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Unclear comment in my opinion.
            // matchingBlock is claimed and removed out of the free queue to
            // avoid itself being offloaded to the secondary block pool.

  1. Better to move this peice of code under the partial match case to make logics clear. Moving it will save us from explaining ourselves again as you did under the 2 else case spots below.
            if (partialMatch)
            {
                if (matchingBlock->hasRefs() || !matchingBlock->isLeaf())
                {

if (!matchingBlock->hasRefs())
{
mEvictionPolicy->claimBlock(matchingBlock);
}

KVCacheBlock::IdType matchingBlockId = matchingBlock->getBlockId();

numMatchedTokens += numMatched > 0 ? numMatched : blockItr->uniqueTokens.size();
Expand Down Expand Up @@ -1252,13 +1259,20 @@ SizeType32 WindowBlockManager::loadOrAllocateBlocks(std::vector<BlockKey> const&
*blockItr, blockItr->uniqueTokens.size() == static_cast<size_t>(mTokensPerBlock));
}
matchingBlock->setHash();
if (!matchingBlock->hasRefs())
{
// Return matchingBlock to free queue since we have copied it and won't be using matchingBlock itself
mEvictionPolicy->releaseBlock(matchingBlock);
}
TLLM_LOG_DEBUG("%s::loadOrAllocateBlocks - Copied partially filled block %d", mLogPrefix.c_str(),
matchingBlockId);
}
else
{
// Leaf block that nobody is using. Make block private and reuse
freeLeafBlock(matchingBlock);
// It is safe to call claimBlock a second time.
// This call merely sets priority and duration.
mEvictionPolicy->claimBlock(
matchingBlock, perBlockRetentions[bi].retentionPriority, perBlockRetentions[bi].durationMs);
TLLM_LOG_DEBUG("%s::loadOrAllocateBlocks - Reused partially filled block %d", mLogPrefix.c_str(),
Expand All @@ -1269,6 +1283,8 @@ SizeType32 WindowBlockManager::loadOrAllocateBlocks(std::vector<BlockKey> const&
else
{
// Recover block and reuse
// It is safe to call claimBlock a second time.
// This call merely sets priority and duration.
mEvictionPolicy->claimBlock(
matchingBlock, perBlockRetentions[bi].retentionPriority, perBlockRetentions[bi].durationMs);
TLLM_LOG_DEBUG("%s::loadOrAllocateBlocks - Matched full block %d", mLogPrefix.c_str(), matchingBlockId);
Expand Down
Loading