[PD] remove splitwise deployment on single node and refine the code #4891

juncaipeng · 2025-11-07T09:49:13Z

Motivation

remove splitwise deployment on single node and refine the code

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2025-11-07T09:49:21Z

Thanks for your contribution!

Copilot

Pull Request Overview

This PR refactors the splitwise (Prefill-Decode disaggregated) deployment architecture by removing the deprecated single-machine deployment mode (v2) and consolidating to two supported methods: v0 using splitwise_scheduler/dp_scheduler, and v1 using local_scheduler with router. The changes simplify the codebase by removing redundant code paths and improving the separation of concerns between prefill and decode instances.

Removed deprecated innode_prefill_ports parameter and associated single-machine PD logic
Refactored decode instance's request processing into a cleaner _decode_process_splitwise_requests function
Updated test utilities to share common functions and removed duplicated port/service management code
Consolidated splitwise version naming from v0/v1/v2 to just v0/v1

Reviewed Changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
tests/e2e/utils/serving_utils.py	Added shared utility functions `check_service_health` and `get_registered_number` for e2e tests
tests/e2e/test_ernie_03b_pd_splitwise_scheduler.py	Updated to use shared utilities, added redis support, removed duplicated helper functions
tests/e2e/test_ernie_03b_pd_router_v0.py	Refactored to import utilities from serving_utils instead of duplicating code
fastdeploy/worker/worker_process.py	Removed automatic ENABLE_V1_KVCACHE_SCHEDULER=0 setting for non-RDMA splitwise
fastdeploy/splitwise/splitwise_connector.py	Removed deprecated innode dispatch methods (`has_splitwise_tasks`, `dispatch_innode_splitwise_tasks`)
fastdeploy/output/token_processor.py	Added timeout warning for cache sending, removed v0-specific result filtering
fastdeploy/inter_communicator/engine_worker_queue.py	Removed `available_prefill_instances` queue that was used for single-machine coordination
fastdeploy/engine/request.py	Added timestamp fields `inference_start_time` and `llm_engine_recv_req_timestamp` to Request class
fastdeploy/engine/engine.py	Removed initialization of `available_prefill_instances` queue
fastdeploy/engine/common_engine.py	Major refactoring: extracted `_insert_prefilled_requests`, renamed `_process_splitwise_task` to `_decode_process_splitwise_requests` with cleaner logic
fastdeploy/engine/async_llm.py	Removed `available_prefill_instances` queue initialization
fastdeploy/engine/args_utils.py	Removed `innode_prefill_ports` argument, added validation for splitwise configuration
fastdeploy/demo/offline_disaggregated_demo.py	Deleted deprecated single-machine offline demo
fastdeploy/config.py	Simplified splitwise version detection logic (v0/v1 only), removed `innode_prefill_ports` config
fastdeploy/cache_manager/transfer_factory/ipc_cache_transfer.py	Removed unused `finish_event` variable
fastdeploy/cache_manager/cache_messager.py	Fixed connection status logging logic
examples/splitwise/start_v2_tp1.sh	Removed deprecated v2 single-machine example script
examples/splitwise/start_v1_tp*.sh	Updated scripts to use router-based v1 deployment method
examples/splitwise/start_v0_tp*.sh	Updated scripts to use splitwise_scheduler for v0 deployment
examples/splitwise/start_mixed.sh	Added test request example for mixed server deployment
docs/zh/features/disaggregated.md	Removed single-machine deployment documentation, updated multi-machine examples
docs/features/disaggregated.md	Removed single-machine deployment documentation, updated multi-machine examples

Comments suppressed due to low confidence (4)

tests/e2e/test_ernie_03b_pd_splitwise_scheduler.py:337

Unused variable assignment. The variable p_url is assigned on line 336 but the function sends the request to p_url instead of d_url on line 337. If the intention was to test both URLs, clarify with a comment. If only prefill URL is needed, remove the d_url assignment.
tests/e2e/test_ernie_03b_pd_splitwise_scheduler.py:366
Inconsistent variable usage pattern. Lines 365-366 and 416-417 follow the pattern of unpacking both URLs but only using one. Similarly, lines 389-390 do the same. For better clarity, consider using tuple unpacking with underscore for unused values: p_url, _ = api_url when only using one URL.
tests/e2e/test_ernie_03b_pd_splitwise_scheduler.py:164
Typo in variable assignment. There's a typo on line 164: env_prefill is incorrectly used instead of env_decode. This should be env_decode["ENABLE_V1_KVCACHE_SCHEDULER"] = "0" to properly set the environment variable for the decode instance.
tests/e2e/test_ernie_03b_pd_splitwise_scheduler.py:249
Inconsistent spacing in f-string expression. Line 249 has FD_API_PORT+1 without spaces around the + operator, while line 249's first URL uses proper spacing. For consistency, use FD_API_PORT + 1.

Copilot · 2025-11-07T09:53:32Z

fastdeploy/engine/common_engine.py

+            for idx, task in enumerate(allocate_resource_requests):
+                is_success = False
+
+                if envs.ENABLE_V1_KVCACHE_SCHEDULER:
+                    if self.resource_manager.preallocate_resource_in_d(task):
+                        self.llm_logger.info(f"Resource available, processing task {task.request_id}")
+                        self.split_connector.send_cache_infos([task], -1)
+                        processed_indices.append(idx)
+                        is_success = True
+                else:
+                    if self.resource_manager.is_resource_sufficient(task.prompt_token_ids_len):
+                        self.llm_logger.info(f"Resource available, processing task {task.request_id}")
+                        self.insert_tasks([task])
+                        processed_indices.append(idx)
+                        is_success = True
+
+                if not is_success:
+                    if not self.enable_decode_cache_task:
+                        task.error_msg = "Not enough resources"
+                        self.split_connector.send_cache_infos([task], -1)
+                        processed_indices.append(idx)
                    else:
-                        waiting_tasks.append(task)
-                self.insert_tasks(ready_tasks, allocated=True)
-                if self.cfg.splitwise_version in ("v0", "v2"):
-                    self.scheduler.put_results(ready_tasks)
-                return waiting_tasks
+                        self.llm_logger.debug(f"Still waiting for resources {task.request_id}")
+                        break


Potentially unclear condition order. The new code structure at lines 1034-1046 processes resource allocation, but line 1048 checks if not is_success: which then has nested logic for enable_decode_cache_task. Consider simplifying this by removing the intermediate is_success variable and directly handling the allocation failure cases within each branch, or add a comment explaining why this structure is necessary.

Copilot · 2025-11-07T09:53:33Z

fastdeploy/engine/common_engine.py

+        allocate_resource_requests: list[Request] = []
+        prefilled_request_ouputs: list[RequestOutput] = []


Incorrect type hint syntax. Line 1009 uses list[Request] which is only valid in Python 3.9+. For broader compatibility, use List[Request] from the typing module. The same issue exists on line 1010 with list[RequestOutput].

Suggested change

allocate_resource_requests: list[Request] = []

prefilled_request_ouputs: list[RequestOutput] = []

allocate_resource_requests: List[Request] = []

prefilled_request_ouputs: List[RequestOutput] = []

Copilot · 2025-11-07T09:53:33Z

tests/e2e/utils/serving_utils.py

+def get_registered_number(router_url) -> list:
+    """
+    Get the number of registered models in the router.
+
+    Args:
+        router_url (str): The base URL of the router, e.g. "http://localhost:8080".
+
+    Returns:
+        int: The number of registered models.


The docstring description "Get the number of registered models" is misleading. Based on the return type and values, this function returns a dictionary with counts for different types ("mixed", "prefill", "decode"), not a single number. Consider updating to: "Get the registered model counts by type from the router."

Suggested change

def get_registered_number(router_url) -> list:

"""

Get the number of registered models in the router.

Args:

router_url (str): The base URL of the router, e.g. "http://localhost:8080".

Returns:

int: The number of registered models.

def get_registered_number(router_url) -> dict:

"""

Get the registered model counts by type from the router.

Args:

router_url (str): The base URL of the router, e.g. "http://localhost:8080".

Returns:

dict: A dictionary with counts for each model type, e.g. {"mixed": int, "prefill": int, "decode": int}.

Copilot · 2025-11-07T09:53:34Z

fastdeploy/engine/request.py

+        inference_start_time: float = 0,
+        llm_engine_recv_req_timestamp: float = 0,


Unused parameter in __init__. The parameters inference_start_time (line 98) and llm_engine_recv_req_timestamp (line 99) are added to the function signature but inference_start_time is never used in the method. Only llm_engine_recv_req_timestamp is overwritten with time.time() on line 166, ignoring the passed value. Either remove the unused parameter or properly utilize the passed value.

Suggested change

inference_start_time: float = 0,

llm_engine_recv_req_timestamp: float = 0,

Copilot · 2025-11-07T09:53:34Z

fastdeploy/cache_manager/cache_messager.py

+                                    if status:
+                                        logger.info(f"connect to {target_ip}:{target_id} success")
+                                    else:
                                        logger.error(f"connect to {target_ip}:{target_id} failed")


Inconsistent error message logic. Line 623 inverts the status check logic - it logs an error when status is True (successful connection) instead of when it's False (failed connection). The condition should be if not status: to correctly log errors only on connection failures.

Copilot · 2025-11-07T09:53:34Z

fastdeploy/engine/args_utils.py

+        if self.splitwise_role != "mixed":
+            if self.scheduler_name == "local" and self.router is None:
+                raise ValueError(
+                    f"When using {self.splitwise_role} role and the {self.scheduler_name} "
+                    f"scheduler, please provide --router argument."
+                )


Missing validation for splitwise configuration. Lines 485-490 add a validation that requires --router when using splitwise_role != "mixed" with scheduler_name == "local". However, there's no corresponding validation for the case when using splitwise_role != "mixed" with scheduler_name in ("splitwise", "dp"). Consider adding validation to ensure proper scheduler configuration for all splitwise deployment scenarios.

Copilot · 2025-11-07T09:53:34Z

fastdeploy/engine/args_utils.py

+                    raise ValueError(
+                        "Please set --rdma_comm_ports argument when using " "rdma cache transfer protocol."
+                    )


Inconsistent error message formatting. Lines 495 and 504 split the error message string across two lines with a space and empty quotes (e.g., "Please set --rdma_comm_ports argument when using " "rdma cache transfer protocol."). This should be formatted consistently, either as a single line or using proper line continuation. For better readability, consider: "Please set --rdma_comm_ports argument when using rdma cache transfer protocol."

Copilot · 2025-11-07T09:53:35Z

fastdeploy/engine/args_utils.py

+            if envs.ENABLE_V1_KVCACHE_SCHEDULER == 1:
+                if "ipc" in self.cache_transfer_protocol:
+                    # FIXME: support ipc cache transfer protocol
+                    raise NotImplementedError(
+                        "only support rdma cache transfer protocol " "when using ENABLE_V1_KVCACHE_SCHEDULER."
+                    )
+                # FIXME: fix this bug
+                if self.splitwise_role == "prefill" and self.num_gpu_blocks_override is None:
+                    raise NotImplementedError(
+                        "please set num_gpu_blocks_override for prefill " "instance using ENABLE_V1_KVCACHE_SCHEDULER."
+                    )


The error messages use the label "FIXME" in comments (lines 502, 506), suggesting these are temporary workarounds or known issues that need to be addressed. Consider creating tracking issues for these limitations and removing the NotImplementedError exceptions once proper implementations are in place, or document why these restrictions exist.

Copilot · 2025-11-07T09:53:35Z

tests/e2e/utils/serving_utils.py

+        return False
+
+
+def get_registered_number(router_url) -> list:


The return type annotation is inconsistent with the actual return type. The docstring and annotation claim this function returns list, but the actual return value is a dictionary {"mixed": 0, "prefill": 0, "decode": 0} in the exception case and response.json() (likely a dict) in the success case. The return type should be dict instead of list.

Copilot · 2025-11-07T09:53:35Z

tests/e2e/utils/serving_utils.py

+        router_url (str): The base URL of the router, e.g. "http://localhost:8080".
+
+    Returns:
+        int: The number of registered models.


The Returns section in the docstring is inconsistent with the actual return type. It states int: The number of registered models, but the function returns a dictionary with multiple counts. Update to: dict: A dictionary containing registered model counts with keys "mixed", "prefill", and "decode".

Suggested change

int: The number of registered models.

dict: A dictionary containing registered model counts with keys "mixed", "prefill", and "decode".

juncaipeng added 2 commits November 7, 2025 09:29

remove sigle node splitwise and refine the code

baa789c

fix

4d89f46

Copilot AI review requested due to automatic review settings November 7, 2025 09:49

Copilot AI reviewed Nov 7, 2025

View reviewed changes

		allocate_resource_requests: list[Request] = []
		prefilled_request_ouputs: list[RequestOutput] = []

		inference_start_time: float = 0,
		llm_engine_recv_req_timestamp: float = 0,

	int: The number of registered models.
	dict: A dictionary containing registered model counts with keys "mixed", "prefill", and "decode".

[PD] remove splitwise deployment on single node and refine the code #4891

Are you sure you want to change the base?

[PD] remove splitwise deployment on single node and refine the code #4891

Uh oh!

Conversation

juncaipeng commented Nov 7, 2025

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Nov 7, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant