Skip to content

[Bug] Sidecar mode shouldn't restart head pod when head pod is deleted #4130

@owenowenisme

Description

@owenowenisme

Search before asking

  • I searched the issues and found no similar issues.

KubeRay Component

ray-operator

What happened + What you expected to happen

When using sidecar mode, the head pod should not be recreated after it is deleted. The RayJob should be marked as Failed.

Reproduction script

go test -v -count=1 ./test/e2erayjob -run 'TestRayJobSidecarMode/RayJob_fails_when_head_Pod_is_deleted_when_job_is_running'

We hit a timeout while waiting for JobDeploymentStatus to become Failed, because the head pod keeps restarting indefinitely and never reaches a failed state.

❯ go test -v -count=1 ./test/e2erayjob -run 'TestRayJobSidecarMode/RayJob_fails_when_head_Pod_is_deleted_when_job_is_running'                                                         (base) 
=== RUN   TestRayJobSidecarMode
    rayjob_sidecar_mode_test.go:28: [2025-10-16T12:25:38Z] Created ConfigMap test-ns-9hssp/jobs successfully
=== RUN   TestRayJobSidecarMode/RayJob_fails_when_head_Pod_is_deleted_when_job_is_running
=== NAME  TestRayJobSidecarMode
    rayjob_sidecar_mode_test.go:169: [2025-10-16T12:25:38Z] Created RayJob test-ns-9hssp/delete-head-after-submit-sidecar-mode successfully
    rayjob_sidecar_mode_test.go:172: [2025-10-16T12:25:38Z] Waiting for RayJob test-ns-9hssp/delete-head-after-submit-sidecar-mode to be 'Running'
    rayjob_sidecar_mode_test.go:183: [2025-10-16T12:26:38Z] Deleting head Pod test-ns-9hssp/delete-head-after-submit-sidecar-mode-lmn8t-head-fwwlv for RayCluster delete-head-after-submit-sidecar-mode-lmn8t
    rayjob_sidecar_mode_test.go:189: 
        Timed out after 120.000s.
        Expected
            <v1.JobDeploymentStatus>: Running
        to equal
            <v1.JobDeploymentStatus>: Failed
=== NAME  TestRayJobSidecarMode/RayJob_fails_when_head_Pod_is_deleted_when_job_is_running
    testing.go:1679: test executed panic(nil) or runtime.Goexit: subtest may have called FailNow on a parent test
=== NAME  TestRayJobSidecarMode
    test.go:114: [2025-10-16T12:28:38Z] Retrieving Pod Container test-ns-9hssp/delete-head-after-submit-sidecar-mode-lmn8t-head-4zv9d/ray-head logs
    test.go:102: [2025-10-16T12:28:38Z] Creating ephemeral output directory as KUBERAY_TEST_OUTPUT_DIR env variable is unset
    test.go:105: [2025-10-16T12:28:38Z] Output directory has been created at: /tmp/TestRayJobSidecarMode2305770799/001
    test.go:114: [2025-10-16T12:28:38Z] Retrieving Pod Container test-ns-9hssp/delete-head-after-submit-sidecar-mode-lmn8t-head-4zv9d/ray-job-submitter logs
    test.go:114: [2025-10-16T12:28:38Z] Retrieving Pod Container test-ns-9hssp/delete-head-after-submit-sidecar-mode-lmn8t-small--worker-8xmbn/ray-worker logs
--- FAIL: TestRayJobSidecarMode (180.52s)
    --- FAIL: TestRayJobSidecarMode/RayJob_fails_when_head_Pod_is_deleted_when_job_is_running (180.39s)
FAIL
FAIL    github.com/ray-project/kuberay/ray-operator/test/e2erayjob      180.554s
FAIL

Anything else

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions