Skip to content

Conversation

@chris-olszewski
Copy link
Member

@chris-olszewski chris-olszewski commented Oct 21, 2025

What was changed

Remove yield points from the @temporalio/interceptors-opentelemetry workflow interceptors. Some yield points were there since the initial version of the package and others were added in #1449 and #1577

Why?

These yield points have the ability to cause NDE due to the inclusion of these interceptors. In addition, it broke startWithSignal operations as signal handlers were not immediately processed upon registration if the workflow was started with the given signal.

Depends on temporalio/sdk-core#1040

Checklist

  1. Closes [Bug] Upgrading OTel Interceptor to 1.11.5+ breaks signal-with-start and may cause NDE #1677

  2. How was this tested:
    Created histories from the affected versions and added tests that replay these histories. These are red/green tests where they initially failed without the changes, but with the flagging logic they passed.

The only interceptors that are not covered by these replay tests are:

  • inbound execute
  • outbound scheduleLocalActivity
  • outbound continueAsNew

As I was unable to trigger NDE from removing/adding yield points to these interceptors.

  1. Any docs updates needed?
    Possibly a warning on https://docs.temporal.io/develop/typescript/interceptors#workflow-interceptors-registration about adding yield points to workflow interceptors.

@chris-olszewski chris-olszewski force-pushed the olszewski/fix_otel_interceptor_nde branch from 717716c to 1006b2b Compare October 21, 2025 19:04
@chris-olszewski chris-olszewski force-pushed the olszewski/fix_otel_interceptor_nde branch from 77b9399 to dac7f6c Compare October 31, 2025 15:41

async execute(input: ActivityExecuteInput, next: Next<ActivityInboundCallsInterceptor, 'execute'>): Promise<unknown> {
const context = await extractContextFromHeaders(input.headers);
const context = extractContextFromHeaders(input.headers);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unconditionally removed this await since we cannot check sdk flags in this context. I was not able to trigger any NDE with this await being present/not present. I believe this removal is safe, but can add a Promise.resolve to keep the yield point if we think that would be safer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know the implications of this change, why do you think this is safe?
My immediate intuition is that it'll causing different microtask queuing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will cause different microtask queueing, but I suppose the question is if that change will be impactful. I can add a synthesized yield point back here to be extra cautious

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm not sure - my feeling is that less disruption would be better, but @mjameswh would be better to ask

@chris-olszewski chris-olszewski marked this pull request as ready for review October 31, 2025 19:43
@chris-olszewski chris-olszewski requested a review from a team as a code owner October 31, 2025 19:43
Comment on lines +43 to +46
// eslint-disable-next-line @typescript-eslint/no-require-imports
const { SdkFlags } = require('@temporalio/workflow/lib/flags');
// eslint-disable-next-line @typescript-eslint/no-require-imports
const { getActivator } = require('@temporalio/workflow/lib/global-attributes');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to import in the function?
Generally - maybe this indicates the function belongs in @temporalio/workflow/lib and we just need a single internal import to use this (instead of 3). This file could just re-export it, or import directly in the index.ts usage, not sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an outcome of the recent PR to make @temporalio/workflow an optional peer dep for the interceptors: #1812, so we have to do some song and dance around importing from that package now that it might not be present.

From my understanding talking to @mjameswh we don't want to advertise these APIs for general consumption. With that in mind, I think keeping these in the nested modules is okay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok. But I feel as though if we're importing import type { SdkFlags as SdkFlagsT } from '@temporalio/workflow/lib/flags'; at the top anyway, it's a bit odd.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the type aliasing that is odd or the fact that we have a top level import and an import in a function?


async execute(input: ActivityExecuteInput, next: Next<ActivityInboundCallsInterceptor, 'execute'>): Promise<unknown> {
const context = await extractContextFromHeaders(input.headers);
const context = extractContextFromHeaders(input.headers);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know the implications of this change, why do you think this is safe?
My immediate intuition is that it'll causing different microtask queuing.

@THardy98
Copy link
Contributor

THardy98 commented Nov 3, 2025

Haven't read through the tests yet.

Comment on lines +10 to +12
const { a, b, c } = workflow.proxyLocalActivities({
scheduleToCloseTimeout: '1m',
});
Copy link
Contributor

@THardy98 THardy98 Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are these activities coming from? I don't see any activities registered on the test workers, am i missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did remove the code to create the histories since the tests only replay history to check for verify there isn't NDE. You can view the code used at abf1134

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Upgrading OTel Interceptor to 1.11.5+ breaks signal-with-start and may cause NDE

3 participants