-
Notifications
You must be signed in to change notification settings - Fork 154
Refactor 'run' #1021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Refactor 'run' #1021
Conversation
8602907 to
027fa17
Compare
| // TODO in next PR: make sure to remove hw breakpoint here | ||
| // In case the hw breakpoint is the entry point, remove it to | ||
| // avoid hanging here as gdb does not remove breakpoints it | ||
| // has not set. | ||
| // Gdb expects the target to be stopped when connected. | ||
| // self.remove_hw_breakpoint(vcpu_fd, entrypoint)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note this self.remove_hw_breakpoint has been moved to another place (handle_debug in each of the 3 drivers), it will be brought back here in next PR.
dblnz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, I appreciate the effort to split the Vm trait PR into multiple smaller PRs
I left some comments, the only "concerning" one is related to the tracing spans reordering.
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
be64e09 to
0360070
Compare
simongdavies
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a few comments, two general comments:
I feel like this PR is doing 2 things and could be split up to make it easier to review:
- Refactor Run - which is what the title says
- Changing the way cancellation is implemented
The second part could (should?) be done in another PR .
I also feel like we need more information about how cancellation works, this really needs to be well documented , maybe we need a diagram or two somewhere
| /// The operation should be retried | ||
| /// On Linux this can happen where a call to run the CPU can return EAGAIN | ||
| /// On Windows the platform could cause a cancelation of the VM run | ||
| /// The operation should be retried, for example this can happen on Linux where a call to run the CPU can return EAGAIN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dont understand why we need to do this? Cant we exclude this from whatever match is using it on Windows? Also I dont understand why we are not seeing this on Windows since IIRC the platform itself can choose to cancel an execution which we then captured and retried, what happened to this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I know what you mean. We only construct VmExit::Retry() on linux by this mapping libc::EAGAIN => VmExit::Retry(). There is no equivalent exit on windows that needs to be retried as far as I know. Regardless, if windows exits with a VmExit::Cancelled(), but there is no cancellation requested for the sandbox, then we retry it (but we still don't construct any VmExit::Retry() which is why the warning is supressed on windows).
| // ===== KILL() TIMING POINT 3: Before calling run_vcpu() ===== | ||
| // If kill() is called and ran to completion BEFORE this line executes: | ||
| // - CANCEL_BIT will be set, but it's too late to prevent entering the guest this iteration | ||
| // - Signals will interrupt the guest (RUNNING_BIT=true), causing VmExit::Cancelled() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isnt quite true since Windows does not use signals?
| // - CANCEL_BIT will be set, but it's too late to prevent entering the guest this iteration | ||
| // - Signals will interrupt the guest (RUNNING_BIT=true), causing VmExit::Cancelled() | ||
| // - If the guest completes before any signals arrive, kill() may have no effect | ||
| // - If there are more iterations to do (IO/host func, etc.), the next iteration will be cancelled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if there are not more iterations to do , is it possible to lie about completion or cancel the next use of this Sandbox?
| // ===== KILL() TIMING POINT 4: Before capturing cancel_requested ===== | ||
| // If kill() is called and ran to completion BEFORE this line executes: | ||
| // - CANCEL_BIT will be set | ||
| // - Signals may still be sent (RUNNING_BIT=true) but are harmless no-ops |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about windows?
| } | ||
| } | ||
| Ok(VmExit::Cancelled()) => { | ||
| // If cancellation was not requested for this specific guest function call, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the scope of this problem, is it possible that we try to kill a long running execution and that we end up delivering the signal to the wrong sandbox? If so does this happen in the current implementation? Either way if this does happen we should really think about a way to prevent this from happening as we don't really want one sandboxes behavior to influence the execution of another sandbox.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does happen in the current implementation and as far as I am aware it's impossible to avoid. Signals are delivered async so there's no guaranteeing what happens in between sending the signal and the signal being delivered. A totally new vm could have started on the same thread or a new guest call could have been made. I think the only thing we can do here is ignore it if signal was not intended for this guest-call or sandbox.
| /// This is acceptable because the generation tracking provides an additional | ||
| /// safety layer. Even if a stale kill somehow stamped cancel_requested, the | ||
| /// generation mismatch would cause it to be ignored. | ||
| call_active: AtomicBool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you delete this? Seemed like a reasonable optimization to me , it avoided a lot of useless processing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This flag provided no optimization as far as I can tell. We always already exit early if running flag is not true.
| /// - CANCEL_BIT remains unchanged | ||
| /// | ||
| /// # Memory Ordering | ||
| /// Uses `Release` ordering to ensure that the `tid` store (which uses `Relaxed`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is extremely confusing, I don't understand the relationship between store of this running atomic and the tid store or the relevance of the relaxed orderings, I think generally where we are use atomics and ordering we should document why the orderings are the way they are , these things are hard to understand when looking at the code later.
I think it will also make reviewing the code easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, typo in the docs. And agree. I'll create a new document for this
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Gives
Hypervisor::runa default implementation and removes each specialized implementation from kvm/mshv/whp. Addsrun_vcputo just run vm for 1 single exit. The rest of the changes are just made to enable the above. Also refactors guest function cancellation slightly (simpler). Certain logic is introduced in this PR that will be removed in my next pr.