Skip to content

Conversation

@BugenZhao
Copy link
Member

@BugenZhao BugenZhao commented Oct 27, 2025

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Extend StreamChunkCompactor to allow specifying OutputKind, which decides whether to produce chunks in

  • Retract format, which convert Update record to a pair of U- and U+ rows
  • or Upsert format, which only keep the new row of Update record and convert it to an Insert row

Dispatch different OutputKind based on whether the SinkType is Upsert or Retract (introduced in last PR: #23593).


Previously...

This led to unnecessary Delete operation to external system, along with I/O overhead and temporary inconsistent state.

Directly produce chunk in upsert format in compaction minimize Delete operation as much as possible. This approach also better aligns with the semantics of SinkType::Upsert. If the sink explicitly requires old value for updates, typically DEBEZIUM, it should be marked SinkType::Retract instead.

By using upsert format in sink-into-table, we also fixes #22579, since the update won't be considered as a pair of Delete and Insert when handling table conflict.

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • I have checked the Release Timeline and Currently Supported Versions to determine which release branches I need to cherry-pick this PR into.

Documentation

  • My PR needs documentation updates.
Release note

Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Copy link
Member Author

BugenZhao commented Oct 27, 2025

@github-actions github-actions bot added the type/feature Type: New feature. label Oct 27, 2025
@BugenZhao BugenZhao changed the title feat(sink): produce Upsert chunk for compaction feat(sink): produce Upsert chunk in compaction Oct 27, 2025
@BugenZhao BugenZhao force-pushed the bz/sink-exec-convert-to-upsert branch from bf326df to d1b8c13 Compare October 27, 2025 08:37
@BugenZhao BugenZhao marked this pull request as ready for review October 27, 2025 08:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR renames StreamChunkCompactor to StreamChunkUpsertCompactor and modifies the compaction logic to produce chunks in upsert format by removing UpdateDelete operations and rewriting UpdateInsert operations as Insert operations. This change minimizes unnecessary Delete operations to external systems and aligns better with SinkType::Upsert semantics.

Key Changes:

  • Renamed StreamChunkCompactor to StreamChunkUpsertCompactor and compact_chunk_inline to into_upsert_compacted_chunk
  • Modified compaction methods to produce upsert format chunks using new Record::into_upsert() method
  • Added Kind type parameter to ChangeBuffer::into_chunk and into_chunks methods to support both upsert and retract formats
  • Updated StreamSink to correctly derive and set stream kind based on sink type

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/stream/src/executor/sink.rs Updates sink executor to use renamed StreamChunkUpsertCompactor and into_upsert_compacted_chunk
src/stream/src/executor/mview/materialize.rs Updates materialize executor to explicitly use RETRACT kind for change buffer
src/stream/src/common/upsert_compact.rs Renames compactor class, updates methods to produce upsert format chunks, and improves documentation
src/stream/src/common/mod.rs Renames module from compact_chunk to upsert_compact
src/stream/src/common/change_buffer.rs Adds Kind type parameter to support both upsert and retract formats in chunk conversion
src/frontend/src/optimizer/plan_node/stream_sink.rs Updates sink type derivation logic and explicitly sets stream kind based on sink type
src/common/src/array/stream_record.rs Adds into_upsert() method to convert Update records to Insert format
src/common/src/array/stream_chunk_builder.rs Refactors append_record to use helper method for cleaner code

Base automatically changed from bz/sink-compact-refactor to main October 27, 2025 12:43
@BugenZhao BugenZhao force-pushed the bz/sink-exec-convert-to-upsert branch from d1b8c13 to 702d0b5 Compare October 28, 2025 03:55
@BugenZhao
Copy link
Member Author

I realized that there's an exception: FORMAT DEBEZIUM actually requires retractable output.

@BugenZhao BugenZhao marked this pull request as draft October 28, 2025 06:13
@BugenZhao BugenZhao changed the base branch from main to graphite-base/23581 October 28, 2025 07:31
@BugenZhao BugenZhao force-pushed the bz/sink-exec-convert-to-upsert branch from 39dbfa6 to ecdff93 Compare October 28, 2025 07:31
@BugenZhao BugenZhao changed the base branch from graphite-base/23581 to bz/official-gecko October 28, 2025 07:31
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
@BugenZhao BugenZhao force-pushed the bz/sink-exec-convert-to-upsert branch from 562d4b8 to c8661ad Compare October 28, 2025 09:18
@BugenZhao BugenZhao changed the title feat(sink): produce Upsert chunk in compaction feat(sink): produce Upsert chunk in compaction for SinkType::Upsert Oct 28, 2025
@BugenZhao
Copy link
Member Author

I realized that there's an exception: FORMAT DEBEZIUM actually requires retractable output.

Addressed by introducing SinkType::Retract first in #23593, and dispatch different behavior based on it's retractable or upsert.

@BugenZhao BugenZhao marked this pull request as ready for review October 28, 2025 09:25
Base automatically changed from bz/official-gecko to main October 30, 2025 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Columns in wide table get overwritten with NULLs when sink uses LEFT JOIN

2 participants