- 
                Notifications
    
You must be signed in to change notification settings  - Fork 140
 
OpenAI Agents Tracing: Run span start/finish in the same context #1197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| span = ctx.run( | ||
| self._create_span, | ||
| name="temporal:startActivity", | ||
| data={"activity": input.activity}, | ||
| input=input, | ||
| ) | ||
| handle = ctx.run(self.next.start_activity, input) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you have to run the things in the copied context (and add stack trace layers and such) just to be able to have the done callback use copied context. A quick glance at CPython shows that's what they're doing in add_done_callback when context is None (because copy_context is cheap and you only need an the copy for it to work properly).
Can just copy context up front and provide to add_done_callback without changing how these things are run. Granted all of that span stuff should be done before the copy_context. Also, in Python 3.12, there is actually a handle.get_context() you could use for add_done_callback, but understood that is newer than our oldest allowed Python version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't appear to fix the problem. If the context is copied up front and then the subsequent operations are not run inside it, it does still fails to detach as the copy and the original are not the same context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The context vars are the same though (assuming you copy the context after you mutate things on the context). Is there somewhere inside OpenAI that validates that a context is the exact instance? If you look at CPython code for add_done_callback at https://github.com/python/cpython/blob/v3.14.0/Lib/asyncio/futures.py#L236-L237, the default also calls copy_context because that's the normal thing to do to "get the current context" to execute under. How does that work today when context is None for add_done_callback? Or are you saying add_done_callback does not work today for OpenAI's span.finish when using default parameter for context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failed to detach context
Traceback (most recent call last):
  File "/Users/tconley/samples-python/.venv/lib/python3.13/site-packages/opentelemetry/context/__init__.py", line 155, in detach
    _RUNTIME_CONTEXT.detach(token)
    ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^
  File "/Users/tconley/samples-python/.venv/lib/python3.13/site-packages/opentelemetry/context/contextvars_context.py", line 53, in detach
    self._current_context.reset(token)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^
ValueError: <Token var=<ContextVar name='current_context' default={} at 0x105566d90> at 0x10bb537c0> was created in a different Context
This still occurs with your suggestion. I think that despite the contexts having the same values, they aren't the same. That's my best guess anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrmm, how did add_done_callback work with span.finish before this PR since that calls span.finish on a copied context I wonder (since that is the default)? Or did span.finish never work? I think this curiosity is one of those worth understanding. I can make my own small replications to develop understanding if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It never 'worked' in the sense that this issue was always here, but it only appears if you use a different tracing provider than the default. OpenAI's default handles it fine, but the instrumentor from the report uses otel which has this context detach log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrmm, OTel already won't work for add_done_callback for workflows since they are distributed. I think we need to require being able to finish a span in a different context (and different machine) from where it was created or take a different approach.
(I am curious how OTel is used here anyways due to inherent OTel limitations concerning deterministic span/trace IDs and such, though I understand that's a different topic, but we may need to suggest the start-and-stop-span-immediately approach for OTel-based OpenAI tracing that we do for our other OTel-based tracing)
What was changed
Title
Why?
With some implementations of OpenAI's tracing abstraction, context is important. In particular, the otel instrumenter will print warnings due to context detachment.
Checklist
Closes [Bug] Langfuse Tracing Not Working with Temporal OpenAI Agents Plugin #1136
How was this tested:
Validated on a test project.
Any docs updates needed?