-
Notifications
You must be signed in to change notification settings - Fork 2
fix(vdsnapshot): resolve fsfreeze race condition #1668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
6918581 to
04b73ba
Compare
Signed-off-by: Roman Sysoev <roman.sysoev@flant.com>
Use kvvmi status instead of vm. Signed-off-by: Roman Sysoev <roman.sysoev@flant.com>
Signed-off-by: Roman Sysoev <roman.sysoev@flant.com>
Signed-off-by: Roman Sysoev <roman.sysoev@flant.com>
Signed-off-by: Roman Sysoev <roman.sysoev@flant.com>
Signed-off-by: Roman Sysoev <roman.sysoev@flant.com>
Signed-off-by: Roman Sysoev <roman.sysoev@flant.com>
Signed-off-by: Roman Sysoev <roman.sysoev@flant.com>
Signed-off-by: Roman Sysoev <roman.sysoev@flant.com>
Signed-off-by: Roman Sysoev <roman.sysoev@flant.com>
Signed-off-by: Roman Sysoev <roman.sysoev@flant.com>
Signed-off-by: Roman Sysoev <roman.sysoev@flant.com>
Signed-off-by: Roman Sysoev <roman.sysoev@flant.com>
Signed-off-by: Roman Sysoev <roman.sysoev@flant.com>
Signed-off-by: Roman Sysoev <roman.sysoev@flant.com>
dffed0c to
9065409
Compare
images/virtualization-artifact/pkg/controller/service/snapshot_service.go
Show resolved
Hide resolved
|
|
||
| filesystemFrozen, _ := conditions.GetCondition(vmcondition.TypeFilesystemFrozen, vm.Status.Conditions) | ||
| if _, ok := kvvmi.Annotations[annotations.AnnVMFilesystemFrozenRequest]; ok { | ||
| return false, fmt.Errorf("failed to check %s/%s fsFreezeStatus: %w", kvvmi.Namespace, kvvmi.Name, ErrUntrustedFilesystemFrozenCondition) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let’s add to the error which exact value was found in the annotation
images/virtualization-artifact/pkg/controller/service/snapshot_service.go
Show resolved
Hide resolved
images/virtualization-artifact/pkg/controller/service/snapshot_service.go
Show resolved
Hide resolved
| if vm != nil && vm.Status.Phase != v1alpha2.MachineStopped && !isFSFrozen { | ||
| canFreeze, err := h.snapshotter.CanFreeze(ctx, kvvmi) | ||
| if err != nil { | ||
| if errors.Is(err, service.ErrUntrustedFilesystemFrozenCondition) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the kvvmi is fetched from the cache once for all functions (in GetKubeVirtVirtualMachineInstance), then there’s no point in checking for Untrusted every time. You already checked that at the very beginning in SyncFSFreezeRequest on line 173.
At this stage of the handler’s processing, such an error should no longer occur. This means it’s an unexpected case that should be handled as an internal error (return reconcile.Result{}, err)
| } else { | ||
| switch { | ||
| case vm == nil, vm.Status.Phase == v1alpha2.MachineStopped: | ||
| if vdSnapshot.Status.Consistent == nil || !*vdSnapshot.Status.Consistent { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just if vdSnapshot.Status.Consistent == nil {. No need to overwrite *vdSnapshot.Status.Consistent if it has already been set.
| case vm == nil, vm.Status.Phase == v1alpha2.MachineStopped: | ||
| if vdSnapshot.Status.Consistent == nil || !*vdSnapshot.Status.Consistent { | ||
| vdSnapshot.Status.Consistent = ptr.To(true) | ||
| return reconcile.Result{RequeueAfter: 2 * time.Second}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let’s add a comment explaining why reconciliation is needed here and what problem it helps prevent
| Message(service.CapitalizeFirstLetter(err.Error() + ".")) | ||
| } | ||
|
|
||
| func (h LifeCycleHandler) unfreezeFilesystemIfFailed(ctx context.Context, vdSnapshot *v1alpha2.VirtualDiskSnapshot) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function should be removed. Instead of it, we should use unfreezeFilesystem.
| } | ||
| return reconcile.Result{}, err | ||
| } | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Terrible cyclomatic complexity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way is much simpler:
if vdSnapshot.Status.Consistent == nil {
if vm == nil || vm.Status.Phase == v1alpha2.MachineStopped || isFSFrozen {
vdSnapshot.Status.Consistent = ptr.To(true)
return reconcile.Result{RequeueAfter: 2 * time.Second}, nil
}
if vdSnapshot.Spec.RequiredConsistency {
err = fmt.Errorf("virtual disk snapshot is not consistent because the virtual machine %s has not been stopped or its filesystem has not been frozen", vm.Name)
setPhaseConditionToFailed(cb, &vdSnapshot.Status.Phase, err)
return reconcile.Result{}, err
}
}
err = h.unfreezeFilesystem(ctx, vdSnapshot.Name, vm, kvvmi)
if err != nil {
if k8serrors.IsConflict(err) {
return reconcile.Result{RequeueAfter: 5 * time.Second}, nil
}
return reconcile.Result{}, err
}
Description
Why do we need it, and what problem does it solve?
Sometimes, when the virtual disk snapshot controller creates a batch of snapshots with required consistency, a race condition in the freeze or unfreeze filesystem operation can cause an inconsistent snapshot. To prevent this, the KubeVirt virtual machine instance is now annotated with a "freeze" or "unfreeze" request, and the service checks that the type of this request matches the
fsFreezeStatus. ThefsFreezeStatusis considered trusted only if it matches the request type.What is the expected result?
Snapshots with required consistency now execute without race conditions in the filesystem freeze process.
Checklist
Changelog entries