- 
                Notifications
    
You must be signed in to change notification settings  - Fork 693
 
Prevent segment appender from writing into already dispatched head buffer #28249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Prevent segment appender from writing into already dispatched head buffer #28249
Conversation
8396f85    to
    d3756c9      
    Compare
  
    | 
           /dt  | 
    
d3756c9    to
    180b333      
    Compare
  
    | 
           /dt  | 
    
          
Retry command for Build#75218please wait until all jobs are finished before running the slash command  | 
    
          
CI test resultstest results on build#75218
 test results on build#75299
 test results on build#75472
  | 
    
98d8d0d    to
    8d9bf0c      
    Compare
  
    | 
           /dt  | 
    
    
      
        1 similar comment
      
    
  
    | 
           /dt  | 
    
Added method that copies the reminder that spans from the last page aligned address to the end of the buffer. The method copies data and sets the chunk positions to make sure the number of not flushed bytes is preserved. Signed-off-by: Michał Maślanka <michal@redpanda.com>
This commit introduces a logic that prevents the head chunk from being appended if it is dispatched i.e. its content is being DMA written. Previously the head chunk buffer might have been passed to the `file::dma_write_call` but appends could still write to that chunk. This caused a problem which manifested on some filesystems and some hardware RAID controllers. The problem manifested as corruption in the last page of a file on disk. This PR prevents the head being written to disk from being concurrently appended. The mechanism used here is copying the reminder which wasn't flushed and spans beyond the last page aligned address to a new chunk. The reminder is copied only if the current head write was dispatched. After copying the chunk writer is requested to recycle the chunk and return it to the cache. Signed-off-by: Michał Maślanka <michal@redpanda.com>
Signed-off-by: Michał Maślanka <michal@redpanda.com>
Sometimes the benchmark can take too long to execute. Increasing timeout to eliminate intermittent failures. Signed-off-by: Michał Maślanka <michal@redpanda.com>
c6c178a    to
    b10ea9b      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a mechanism to prevent the segment appender from writing to the head chunk buffer while it's being DMA written to disk. This addresses data corruption issues that could occur on certain filesystems and hardware RAID controllers when concurrent appends modified data that was already dispatched for writing.
Key changes:
- Prevents concurrent modification of dispatched head chunks by copying unwritten data to a new chunk
 - Adds tracking for bytes copied in chunk reminders to detect when the prevention mechanism is active
 - Introduces comprehensive tests to verify the fix works correctly under various concurrent scenarios
 
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description | 
|---|---|
src/v/storage/segment_appender.cc | 
Core logic to detect dispatched writes and copy chunk reminders to prevent concurrent modification | 
src/v/storage/segment_appender.h | 
API changes including new stats tracking and chunk dispatch detection methods | 
src/v/storage/segment_appender_chunk.h | 
New copy_reminder_from method to safely copy unwritten data between chunks | 
src/v/storage/tests/segment_appender_test.cc | 
Comprehensive test coverage for concurrent flush scenarios and chunk copying behavior | 
src/v/storage/tests/BUILD | 
Test timeout configuration update | 
| * pending_aligned_begin() i.e. it is the part that overflows last full page | ||
| * boundary. | ||
| * | ||
| * This method preservers the position and flushed position relative to the | 
    
      
    
      Copilot
AI
    
    
    
      Nov 3, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'preservers' to 'preserves'.
| * This method preservers the position and flushed position relative to the | |
| * This method preserves the position and flushed position relative to the | 
| * | ||
| * This method preservers the position and flushed position relative to the | ||
| * pending_aligned_begin() | ||
| * IMPORTANT: this method will reset the chunk conent before copying the | 
    
      
    
      Copilot
AI
    
    
    
      Nov 3, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'conent' to 'content'.
| * IMPORTANT: this method will reset the chunk conent before copying the | |
| * IMPORTANT: this method will reset the chunk content before copying the | 
| // true if the write extends to the end of the chunk or current write is | ||
| // the last one using current chunk i.e., this is the last write that | ||
| // will use the current chunk before it is recycled | 
    
      
    
      Copilot
AI
    
    
    
      Nov 3, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment should be updated to reflect the new field name. The field was renamed from 'full' to 'last_write_to_current_chunk' but the comment still references the old meaning.
| // true if the write extends to the end of the chunk or current write is | |
| // the last one using current chunk i.e., this is the last write that | |
| // will use the current chunk before it is recycled | |
| // true if this is the last write that will use the current chunk before | |
| // it is recycled. This does not necessarily mean the write extends to | |
| // the end of the chunk. | 
| 
               | 
          ||
| chunk_1.flush(); | ||
| auto chunk_3 = make_chunk(16_KiB); | ||
| // only last 2 KiB should be | 
    
      
    
      Copilot
AI
    
    
    
      Nov 3, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on line 323 is incomplete. It should specify what the 'only last 2 KiB should be' refers to (e.g., 'only last 2 KiB should be copied').
| // only last 2 KiB should be | |
| // only last 2 KiB should be copied to chunk_3 from chunk_1 after flush | 
          
Retry command for Build#75472please wait until all jobs are finished before running the slash command  | 
    
This commit introduces a logic that prevents the head chunk from being
appended if it is dispatched i.e. its content is being DMA written.
Previously the head chunk buffer might have been passed to the
file::dma_write_callbut appends could still write to that chunk. Thiscaused a problem which manifested on some filesystems and some hardware
RAID controllers. The problem manifested as corruption in the last page
of a file on disk.
This PR prevents the head being written to disk from being concurrently
appended. The mechanism used here is copying the reminder which wasn't
flushed and spans beyond the last page aligned address to a new chunk.
The reminder is copied only if the current head write was dispatched.
After copying the chunk writer is requested to recycle the chunk and
return it to the cache.
Backports Required
Release Notes
Improvements