Skip to main content

OpenAI Agents SDK review

·1321 words·7 mins
Sungho Park (gigio1023)
Author
Sungho Park (gigio1023)
To build a genuinely useful product

OpenAI Agents SDK
#

An SDK leveraging OpenAI API resources for building an agent ecosystem.

Overall Impression
#

Compare to Langchain
#

Compared to Langchain, the new SDK feels way more straightforward. Thanks to its deeper reliance on Pydantic, the agent loop becomes much clearer.

  • Defining agent-specific outputs and guardrails is now intuitive and easy to grasp at a glance.
  • Langchain has a clear strength in terms of integration support, offering many modules. However, the complexity ramps up significantly when you try to dig into the internal implementations. In some cases, you need a solid understanding of Langchain’s inner workings just to properly set type hints or build extensions.
  • On a side note, Langchain’s agent internals are still littered with legacy code from the GPT-3 era, making it a pain to read. Despite chat completion becoming the new norm, you still have to sift through outdated completion-style implementations and documentation.

Tracing and Evaluation
#

The tracing UI is essentially identical to Langsmith’s, which remains user-friendly and visually clear.

  • It’s another reminder that agent trajectory definitions across ecosystems look surprisingly similar.
  • However, there’s only one trace dashboard per project, putting extra responsibility on whoever manages the OpenAI account to keep projects well-organized.

Interestingly, the previously under-the-radar OpenAI eval functionality is now fully accessible.

The introduction of robust tracing and evaluation features feels like an even tighter lock-in to the OpenAI ecosystem.

OpenAI Assistant API
#

Lastly, the OpenAI Assistant API branding seems increasingly unclear. With Assistant API’s vector search capability apparently rolled into the Agents SDK as a tool, it’s gotten even murkier. Given these substantial changes in functionality, it might be time for a full rebrand—perhaps something along the lines of an “Agents API.”

Agent
#

Uses a generally accepted definition of an “Agent”:

  • Model
  • Tool
  • Guardrail

How the Agent Works
#

It’s basically the same as the ReAct Agent found in LangChain and LlamaIndex.

A notable difference is the introduction of the handoff term, defining when one agent delegates a task to another.

A single sequence of the following steps is defined as a turn. This aligns exactly with the conventional concept of a conversational turn.

According to OpenAI Docs,

  1. We call the LLM for the current agent, with the current input.
  2. The LLM produces its output.
    1. If the LLM returns a final_output, the loop ends and we return the result.
    2. If the LLM does a handoff, we update the current agent and input, and re-run the loop.
    3. If the LLM produces tool calls, we run those tool calls, append the results, and re-run the loop.
  3. If we exceed the max_turns passed, we raise a MaxTurnsExceeded exception.

Concept of Agents SDK
#

  • Agents
  • Tools
  • Runner
  • Guardrails

Tools
#

There’re 3 categories of tools

Hosted tools:
#

A tool provided by OpenAI. Operates via OpenAI’s API. Billing also goes through OpenAI.

  • Web search → governed by OpenAI’s search policy (fine-tuned model, $25~$50 per 1K requests).
  • File search → files uploaded to OpenAI’s file servers (storage and search costs charged separately).
  • Computer use → leverages virtual machines provided by OpenAI (fine-tuned model, pricing TBD).
from agents import Agent, FileSearchTool, Runner, WebSearchTool

agent = Agent(
    name="Assistant",
    tools=[
        WebSearchTool(),
        FileSearchTool(
            max_num_results=3,
            vector_store_ids=["VECTOR_STORE_ID"],
        ),
    ],
)

async def main():
    result = await Runner.run(agent, "Which coffee shop should I go to, taking into account my preferences and the weather today in SF?")
    print(result.final_output)

Function calling:
#

Arguments and docstrings are automatically parsed by the Agents library to fill in tool names, arguments, descriptions, and more—exactly like LangChain.

import json

from typing_extensions import TypedDict, Any

from agents import Agent, FunctionTool, RunContextWrapper, function_tool

class Location(TypedDict):
    lat: float
    long: float

@function_tool  
async def fetch_weather(location: Location) -> str:
    
    """Fetch the weather for a given location.

    Args:
        location: The location to fetch the weather for.
    """
    # In real life, we'd fetch the weather from a weather API
    return "sunny"

@function_tool(name_override="fetch_data")  
def read_file(ctx: RunContextWrapper[Any], path: str, directory: str | None = None) -> str:
    """Read the contents of a file.

    Args:
        path: The path to the file to read.
        directory: The directory to read the file from.
    """
    # In real life, we'd read the file from the file system
    return "<file contents>"

agent = Agent(
    name="Assistant",
    tools=[fetch_weather, read_file],  
)

for tool in agent.tools:
    if isinstance(tool, FunctionTool):
        print(tool.name)
        print(tool.description)
        print(json.dumps(tool.params_json_schema, indent=2))
        print()

Agents as tools:
#

Agents can be registered and used as tools.

You can set a custom name for each agent, and the input to the agent is passed as a parameter.

from agents import Agent, Runner
import asyncio

spanish_agent = Agent(
    name="Spanish agent",
    instructions="You translate the user's message to Spanish",
)

french_agent = Agent(
    name="French agent",
    instructions="You translate the user's message to French",
)

orchestrator_agent = Agent(
    name="orchestrator_agent",
    instructions=(
        "You are a translation agent. You use the tools given to you to translate."
        "If asked for multiple translations, you call the relevant tools."
    ),
    tools=[
        spanish_agent.as_tool(
            tool_name="translate_to_spanish",
            tool_description="Translate the user's message to Spanish",
        ),
        french_agent.as_tool(
            tool_name="translate_to_french",
            tool_description="Translate the user's message to French",
        ),
    ],
)

async def main():
    result = await Runner.run(orchestrator_agent, input="Say 'Hello, how are you?' in Spanish.")
    print(result.final_output)

Handoffs
#

One of the possible actions an agent can perform:

Delegating the current task to another agent.

Handoff target:

  • Simple agent.
  • Handoff object: This is also an agent, but allows specifying more detailed handoff actions.

You can use the prebuilt handoff prompts provided by OpenAI.

from agents import Agent, handoff

billing_agent = Agent(name="Billing agent")
refund_agent = Agent(name="Refund agent")

triage_agent = Agent(name="Triage agent", handoffs=[billing_agent, handoff(refund_agent)])
from agents import Agent
from agents.extensions.handoff_prompt import RECOMMENDED_PROMPT_PREFIX

billing_agent = Agent(
    name="Billing agent",
    instructions=f"""{RECOMMENDED_PROMPT_PREFIX}
    <Fill in the rest of your prompt here>.""",
)

Tracing
#

Agent tracing is available through OpenAI’s dashboard, similar to LangChain’s LangSmith.

  • Pros:
    • No need for separate monitoring infrastructure.
    • Probably covers vLLM OpenAI-compatible servers as well.
  • Cons:
    • Not clear yet—honestly just seems pretty solid.

Guardrails
#

Guardrails run in parallel with agents, validating agent behavior.

They are categorized into input and output guardrails:

  • Input Guardrail:
    • Validates the input provided to an agent. If the JSON output field tripwire_triggered returns true, an InputGuardrailTripwireTriggered exception is raised.
  • Output Guardrail:
    • Validates the output generated by an agent. Similarly, if the JSON output field tripwire_triggered returns true, an OutputGuardrailTripwireTriggered exception is raised.

Runners
#

Similar to LangChain, there is the concept of a runnerable object, though usage slightly differs. The primary interaction is through a .run method. • Runners: Bundle and execute one or more agents in a loop. • Capable of generating responses for a single turn.

Streaming
#

Events types are defined in here, using Literal not Enum.

import asyncio
import random
from agents import Agent, ItemHelpers, Runner, function_tool

@function_tool
def how_many_jokes() -> int:
    return random.randint(1, 10)

async def main():
    agent = Agent(
        name="Joker",
        instructions="First call the `how_many_jokes` tool, then tell that many jokes.",
        tools=[how_many_jokes],
    )

    result = Runner.run_streamed(
        agent,
        input="Hello",
    )
    print("=== Run starting ===")

    async for event in result.stream_events():
        # We'll ignore the raw responses event deltas
        if event.type == "raw_response_event":
            continue
        # When the agent updates, print that
        elif event.type == "agent_updated_stream_event":
            print(f"Agent updated: {event.new_agent.name}")
            continue
        # When items are generated, print them
        elif event.type == "run_item_stream_event":
            if event.item.type == "tool_call_item":
                print("-- Tool was called")
            elif event.item.type == "tool_call_output_item":
                print(f"-- Tool output: {event.item.output}")
            elif event.item.type == "message_output_item":
                print(f"-- Message output:\n {ItemHelpers.text_message_output(event.item)}")
            else:
                pass  # Ignore other event types

    print("=== Run complete ===")

if __name__ == "__main__":
    asyncio.run(main())

Ohters
#

Model usage in different LLM API Provider

OpenAI

from agents import Agent, Runner, AsyncOpenAI, OpenAIChatCompletionsModel
import asyncio

spanish_agent = Agent(
    name="Spanish agent",
    instructions="You only speak Spanish.",
    model="o3-mini", 
)

english_agent = Agent(
    name="English agent",
    instructions="You only speak English",
    model=OpenAIChatCompletionsModel( 
        model="gpt-4o",
        openai_client=AsyncOpenAI()
    ),
)

triage_agent = Agent(
    name="Triage agent",
    instructions="Handoff to the appropriate agent based on the language of the request.",
    handoffs=[spanish_agent, english_agent],
    model="gpt-4o",
)

async def main():
    result = await Runner.run(triage_agent, input="Hola, ¿cómo estás?")
    print(result.final_output)

Custom OpenAI Compatible Server

external_client = AsyncOpenAI(
    api_key="EXTERNAL_API_KEY",
    base_url="https://api.external.com/v1/",
)

spanish_agent = Agent(
    name="Spanish agent",
    instructions="You only speak Spanish.",
    model=OpenAIChatCompletionsModel(
        model="EXTERNAL_MODEL_NAME",
        openai_client=external_client,
    ),
    model_settings=ModelSettings(temperature=0.5),
)
Reply by Email