OpenAI Agents SDK#
An SDK leveraging OpenAI API resources for building an agent ecosystem.
Overall Impression#
Compare to Langchain#
Compared to Langchain, the new SDK feels way more straightforward. Thanks to its deeper reliance on Pydantic, the agent loop becomes much clearer.
- Defining agent-specific outputs and guardrails is now intuitive and easy to grasp at a glance.
- Langchain has a clear strength in terms of integration support, offering many modules. However, the complexity ramps up significantly when you try to dig into the internal implementations. In some cases, you need a solid understanding of Langchain’s inner workings just to properly set type hints or build extensions.
- On a side note, Langchain’s agent internals are still littered with legacy code from the GPT-3 era, making it a pain to read. Despite chat completion becoming the new norm, you still have to sift through outdated completion-style implementations and documentation.
Tracing and Evaluation#
The tracing UI is essentially identical to Langsmith’s, which remains user-friendly and visually clear.
- It’s another reminder that agent trajectory definitions across ecosystems look surprisingly similar.
- However, there’s only one trace dashboard per project, putting extra responsibility on whoever manages the OpenAI account to keep projects well-organized.
Interestingly, the previously under-the-radar OpenAI eval functionality is now fully accessible.
The introduction of robust tracing and evaluation features feels like an even tighter lock-in to the OpenAI ecosystem.
OpenAI Assistant API#
Lastly, the OpenAI Assistant API branding seems increasingly unclear. With Assistant API’s vector search capability apparently rolled into the Agents SDK as a tool, it’s gotten even murkier. Given these substantial changes in functionality, it might be time for a full rebrand—perhaps something along the lines of an “Agents API.”
Agent#
Uses a generally accepted definition of an “Agent”:
- Model
- Tool
- Guardrail
How the Agent Works#
It’s basically the same as the ReAct Agent found in LangChain and LlamaIndex.
A notable difference is the introduction of the handoff
term, defining when one agent delegates a task to another.
A single sequence of the following steps is defined as a turn
. This aligns exactly with the conventional concept of a conversational turn.
According to OpenAI Docs,
- We call the LLM for the current agent, with the current input.
- The LLM produces its output.
- If the LLM returns a
final_output
, the loop ends and we return the result. - If the LLM does a handoff, we update the current agent and input, and re-run the loop.
- If the LLM produces tool calls, we run those tool calls, append the results, and re-run the loop.
- If the LLM returns a
- If we exceed the
max_turns
passed, we raise aMaxTurnsExceeded
exception.
Concept of Agents SDK#
- Agents
- Tools
- Runner
- Guardrails
Tools#
There’re 3 categories of tools
Hosted tools:#
A tool provided by OpenAI. Operates via OpenAI’s API. Billing also goes through OpenAI.
- Web search → governed by OpenAI’s search policy (fine-tuned model, $25~$50 per 1K requests).
- File search → files uploaded to OpenAI’s file servers (storage and search costs charged separately).
- Computer use → leverages virtual machines provided by OpenAI (fine-tuned model, pricing TBD).
from agents import Agent, FileSearchTool, Runner, WebSearchTool
agent = Agent(
name="Assistant",
tools=[
WebSearchTool(),
FileSearchTool(
max_num_results=3,
vector_store_ids=["VECTOR_STORE_ID"],
),
],
)
async def main():
result = await Runner.run(agent, "Which coffee shop should I go to, taking into account my preferences and the weather today in SF?")
print(result.final_output)
Function calling:#
Arguments and docstrings are automatically parsed by the Agents library to fill in tool names, arguments, descriptions, and more—exactly like LangChain.
import json
from typing_extensions import TypedDict, Any
from agents import Agent, FunctionTool, RunContextWrapper, function_tool
class Location(TypedDict):
lat: float
long: float
@function_tool
async def fetch_weather(location: Location) -> str:
"""Fetch the weather for a given location.
Args:
location: The location to fetch the weather for.
"""
# In real life, we'd fetch the weather from a weather API
return "sunny"
@function_tool(name_override="fetch_data")
def read_file(ctx: RunContextWrapper[Any], path: str, directory: str | None = None) -> str:
"""Read the contents of a file.
Args:
path: The path to the file to read.
directory: The directory to read the file from.
"""
# In real life, we'd read the file from the file system
return "<file contents>"
agent = Agent(
name="Assistant",
tools=[fetch_weather, read_file],
)
for tool in agent.tools:
if isinstance(tool, FunctionTool):
print(tool.name)
print(tool.description)
print(json.dumps(tool.params_json_schema, indent=2))
print()
Agents as tools:#
Agents can be registered and used as tools.
You can set a custom name for each agent, and the input to the agent is passed as a parameter.
from agents import Agent, Runner
import asyncio
spanish_agent = Agent(
name="Spanish agent",
instructions="You translate the user's message to Spanish",
)
french_agent = Agent(
name="French agent",
instructions="You translate the user's message to French",
)
orchestrator_agent = Agent(
name="orchestrator_agent",
instructions=(
"You are a translation agent. You use the tools given to you to translate."
"If asked for multiple translations, you call the relevant tools."
),
tools=[
spanish_agent.as_tool(
tool_name="translate_to_spanish",
tool_description="Translate the user's message to Spanish",
),
french_agent.as_tool(
tool_name="translate_to_french",
tool_description="Translate the user's message to French",
),
],
)
async def main():
result = await Runner.run(orchestrator_agent, input="Say 'Hello, how are you?' in Spanish.")
print(result.final_output)
Handoffs#
One of the possible actions an agent can perform:
Delegating the current task to another agent.
Handoff target:
- Simple agent.
- Handoff object: This is also an agent, but allows specifying more detailed handoff actions.
You can use the prebuilt handoff prompts provided by OpenAI.
from agents import Agent, handoff
billing_agent = Agent(name="Billing agent")
refund_agent = Agent(name="Refund agent")
triage_agent = Agent(name="Triage agent", handoffs=[billing_agent, handoff(refund_agent)])
from agents import Agent
from agents.extensions.handoff_prompt import RECOMMENDED_PROMPT_PREFIX
billing_agent = Agent(
name="Billing agent",
instructions=f"""{RECOMMENDED_PROMPT_PREFIX}
<Fill in the rest of your prompt here>.""",
)
Tracing#
Agent tracing is available through OpenAI’s dashboard, similar to LangChain’s LangSmith.
- Pros:
- No need for separate monitoring infrastructure.
- Probably covers vLLM OpenAI-compatible servers as well.
- Cons:
- Not clear yet—honestly just seems pretty solid.
Guardrails#
Guardrails run in parallel with agents, validating agent behavior.
They are categorized into input and output guardrails:
- Input Guardrail:
- Validates the input provided to an agent. If the JSON output field tripwire_triggered returns true, an
InputGuardrailTripwireTriggered
exception is raised.
- Validates the input provided to an agent. If the JSON output field tripwire_triggered returns true, an
- Output Guardrail:
- Validates the output generated by an agent. Similarly, if the JSON output field tripwire_triggered returns true, an
OutputGuardrailTripwireTriggered
exception is raised.
- Validates the output generated by an agent. Similarly, if the JSON output field tripwire_triggered returns true, an
Runners#
Similar to LangChain, there is the concept of a runnerable object, though usage slightly differs. The primary interaction is through a .run method. • Runners: Bundle and execute one or more agents in a loop. • Capable of generating responses for a single turn.
Streaming#
Events types are defined in here, using Literal
not Enum.
import asyncio
import random
from agents import Agent, ItemHelpers, Runner, function_tool
@function_tool
def how_many_jokes() -> int:
return random.randint(1, 10)
async def main():
agent = Agent(
name="Joker",
instructions="First call the `how_many_jokes` tool, then tell that many jokes.",
tools=[how_many_jokes],
)
result = Runner.run_streamed(
agent,
input="Hello",
)
print("=== Run starting ===")
async for event in result.stream_events():
# We'll ignore the raw responses event deltas
if event.type == "raw_response_event":
continue
# When the agent updates, print that
elif event.type == "agent_updated_stream_event":
print(f"Agent updated: {event.new_agent.name}")
continue
# When items are generated, print them
elif event.type == "run_item_stream_event":
if event.item.type == "tool_call_item":
print("-- Tool was called")
elif event.item.type == "tool_call_output_item":
print(f"-- Tool output: {event.item.output}")
elif event.item.type == "message_output_item":
print(f"-- Message output:\n {ItemHelpers.text_message_output(event.item)}")
else:
pass # Ignore other event types
print("=== Run complete ===")
if __name__ == "__main__":
asyncio.run(main())
Ohters#
Model usage in different LLM API Provider
OpenAI
from agents import Agent, Runner, AsyncOpenAI, OpenAIChatCompletionsModel
import asyncio
spanish_agent = Agent(
name="Spanish agent",
instructions="You only speak Spanish.",
model="o3-mini",
)
english_agent = Agent(
name="English agent",
instructions="You only speak English",
model=OpenAIChatCompletionsModel(
model="gpt-4o",
openai_client=AsyncOpenAI()
),
)
triage_agent = Agent(
name="Triage agent",
instructions="Handoff to the appropriate agent based on the language of the request.",
handoffs=[spanish_agent, english_agent],
model="gpt-4o",
)
async def main():
result = await Runner.run(triage_agent, input="Hola, ¿cómo estás?")
print(result.final_output)
Custom OpenAI Compatible Server
external_client = AsyncOpenAI(
api_key="EXTERNAL_API_KEY",
base_url="https://api.external.com/v1/",
)
spanish_agent = Agent(
name="Spanish agent",
instructions="You only speak Spanish.",
model=OpenAIChatCompletionsModel(
model="EXTERNAL_MODEL_NAME",
openai_client=external_client,
),
model_settings=ModelSettings(temperature=0.5),
)