Handlebar Integration with Python Agents

For agents built without a directly supported framework, handlebar-core provides the primitives to wire governance into any agent loop.

Prerequisites

A Handlebar account
Handlebar API key (created on the platform)

Install

pip install handlebar-core

How it fits into your agent

A typical agent loop looks like this:

start
  └─ [LLM call] → [tool call] → [LLM call] → [tool call] → ...
end

Handlebar wraps each step:

start_run
  └─ before_llm → [LLM call] → after_llm
  └─ before_tool → (allow / block) → [tool call] → after_tool
  └─ before_llm → [LLM call] → after_llm
  └─ ...
end_run

start_run / end_run — bracket the entire agent invocation. Handlebar uses these to group all events and enforce run-level budgets.
before_llm / after_llm — record the messages sent to the model and the response (including token usage) for the audit log.
before_tool — evaluate the proposed tool call against your configured policies. Returns a Decision that tells you whether to proceed, block, or stop the run.
after_tool — record the tool’s result.

The client

HandlebarClient is the top-level object that connects your agent to the Handlebar platform. It handles authentication, registers your agent, and manages the audit event stream. Initialise it once at application startup — not once per request or per run. The client is designed to be shared: it maintains a single connection to the platform and fans out events for all concurrent runs.

from handlebar.core import HandlebarClient, HandlebarClientConfig, AgentDescriptor

config = HandlebarClientConfig(
    agent=AgentDescriptor(slug="my-agent"),
    # HANDLEBAR_API_KEY is read from the environment if api_key is not set here.
)

client = await HandlebarClient.init(config)

For sync code, use init_sync:

client = HandlebarClient.init_sync(config)

AgentDescriptor.slug is the stable identifier for your agent on the Handlebar platform. It is how you match runs in the dashboard to a specific agent in your codebase. Use a short, unique, URL-safe string — e.g. "travel-booking-agent".

Runs

A run represents one end-to-end invocation of your agent — from the first message in to the final response out. Everything that happens in between (LLM calls, tool calls, decisions) is attributed to that run in the audit log. Start a new run at the beginning of each invocation and end it when the agent finishes:

from handlebar.core import RunConfig

run = await client.start_run(RunConfig(run_id="unique-run-id"))

# ... agent loop ...

await run.end("success")   # "success" | "error" | "interrupted" | "timeout"

run_id must be unique per invocation. A UUID works well:

from uuid import uuid4

run = await client.start_run(RunConfig(run_id=str(uuid4())))

The run object is what you interact with throughout the agent loop. Hold a reference to it for the lifetime of the invocation — typically as a local variable in the function that runs your agent.

Agent lifecycle

Before LLM call

run.before_llm(messages)

Call this before each call to the language model, passing the messages being sent. It emits message.raw.created audit events for each message, enabling full conversation logging on the platform.

from handlebar.core import LLMMessage, LLMResponse, ModelInfo, TokenUsage

# Before — pass the messages you are about to send.
messages = [
    LLMMessage(role="system", content="You are a helpful assistant."),
    LLMMessage(role="user",   content="Book me a flight to Paris."),
]
await run.before_llm(messages)

# ... call your LLM here ...
llm_output = await my_llm.chat(messages)

before_llm is optional - skipping it means conversation content won’t appear in audit logs, but tool governance still works fully.

After LLM call

run.after_llm(response)

Call this after the LLM responds. It logs the response content, records token usage for cost tracking, and emits an llm.result event.

llm_output = await my_llm.chat(messages)

await run.after_llm(
    LLMResponse(
        model=ModelInfo(name="gpt-4o", provider="openai"),
        content=[{"type": "text", "text": llm_output.text}],
        output_text=llm_output.text,
        usage=TokenUsage(
            input_tokens=llm_output.usage.prompt_tokens,
            output_tokens=llm_output.usage.completion_tokens,
        ),
    )
)

Like before_llm, this is optional but enables token-based budget enforcement and spend tracking on the platform.

Before tool call

run.before_tool

before_tool is where governance actually happens. It evaluates the proposed call against your configured rules. and returns a Decision indicating whether to allow or block the call.

from handlebar.core.schema.governance import RunControl, Verdict

decision = await run.before_tool(
    tool_name="book_flight",
    args={"destination": "Paris", "date": "2025-06-01"},
)

if decision.control == RunControl.TERMINATE:
    # A policy has determined the run should stop entirely.
    # Block this tool and do not make any further LLM or tool calls.
    tool_result = {"error": f"Blocked: {decision.message}"}
    await run.end("interrupted")

elif decision.verdict == Verdict.BLOCK:
    # This specific tool call is not allowed.
    # Return an error to the agent so it can respond gracefully.
    tool_result = {"error": f"Blocked: {decision.message}"}

else:
    # Allowed — execute the tool.
    tool_result = await book_flight(destination="Paris", date="2025-06-01")

The Decision shape:

class Decision(_Base):
    verdict: Verdict
    control: RunControl
    cause: DecisionCause
    message: str
    evaluated_rules: list[RuleEval] = Field(default_factory=list)
    final_rule_id: str | None = None

BLOCK + CONTINUE means the tool should be skipped but the agent loop can continue - the blocked message is typically returned to the LLM so it can respond gracefully. BLOCK + TERMINATE means the run should stop entirely. Throw an error that propagates up through your agent loop, catch it at the top level, and call run.end("interrupted").

After tool call

run.after_tool

Call this after every tool invocation, regardless of success or failure. It logs the result and evaluates any tool.after rules (e.g. inspecting output content or checking data exfiltration patterns).

tool_result = await tool_function_in_code(tool_args)

# Always call after_tool, even when the tool was blocked.
await run.after_tool(
    tool_name="book_flight",
    args={"destination": "Paris", "date": "2025-06-01"},
    result=tool_result,
)

Sync usage

Every async method has a _sync counterpart:

client   = HandlebarClient.init_sync(config)
run      = client.start_run_sync(RunConfig(run_id=str(uuid4())))
decision = run.before_tool_sync("book_flight", {"destination": "Paris"})
run.end_sync("success")

Additional config

Tool tags

Tags let you group tools by function so governance rules apply to the whole category rather than individual tool names. For example:

Rate-limit all "search" tools regardless of which search API they call
Block data exfiltration by preventing a "pii-read" result flowing into an "external" tool
Require human review before any "write" action

Register tags when you configure the client, and Handlebar will look them up automatically each time before_tool is called:

from handlebar.core import HandlebarClientConfig, AgentDescriptor, Tool

config = HandlebarClientConfig(
    agent=AgentDescriptor(slug="my-agent"),
    tools=[
        Tool(name="book_flight",  tags=["travel", "write", "external-comms"]),
        Tool(name="read_pii",     tags=["pii-read", "sensitive"]),
        Tool(name="send_email",   tags=["external-comms", "write"]),
        Tool(name="search_web",   tags=["search", "read"]),
    ],
)

If the tag set for a tool varies per invocation, pass tags directly at call time instead:

decision = await run.before_tool(
    tool_name="export_report",
    args={"format": "csv"},
    tool_tags=["export", "external-comms"],
)

End-user / actor configuration

Attach the identity of the end-user to the run so that Handlebar can:

Enforce per-user budgets — e.g. each user gets a fixed number of tool calls or tokens per day
Attribute every audit event to a specific user
Apply per-user rules — e.g. restrict which tools a given user role can invoke

Pass the actor when starting the run:

from handlebar.core import Actor, RunConfig

run = await client.start_run(
    RunConfig(
        run_id=str(uuid4()),
        actor=Actor(external_id="user_abc123"),   # your app's identifier for this user
        session_id="session_xyz",                  # groups runs from the same conversation
    )
)

external_id can be any stable identifier your application uses — a database ID, a UUID, an email address, etc. Handlebar stores it opaquely and uses it to aggregate usage and enforce per-user policies. session_id is optional but recommended when an agent can be invoked multiple times within a single user session — it lets the platform group related runs together.

Misc options

from handlebar.core import HandlebarClientConfig, AgentDescriptor, ConsoleSinkConfig

config = HandlebarClientConfig(
    agent=AgentDescriptor(
        slug="my-agent",
        name="My Agent",
        description="Books travel and manages calendars",
    ),
    enforce_mode="enforce",   # "enforce" | "shadow" | "off"
    fail_closed=False,        # True = block all tool calls if Handlebar API is unreachable
)

`enforce_mode`	Behaviour
`"enforce"`	Governance decisions are applied — blocked tools are stopped
`"shadow"`	Decisions are evaluated and logged but never enforced
`"off"`	No API calls; pass-through only

Getting started

How-to

Platform

Handlebar Integration with Python Agents

Prerequisites

Install

How it fits into your agent

The client

Runs

Agent lifecycle

Before LLM call

After LLM call

Before tool call

After tool call

Sync usage

Additional config

Tool tags

End-user / actor configuration

Misc options

Getting started

How-to

Platform

Documentation Index

​Prerequisites

​Install

​How it fits into your agent

​The client

​Runs

​Agent lifecycle

​Before LLM call

​After LLM call

​Before tool call

​After tool call

​Sync usage

​Additional config

​Tool tags

​End-user / actor configuration

​Misc options

Prerequisites

Install

How it fits into your agent

The client

Runs

Agent lifecycle

Before LLM call

After LLM call

Before tool call

After tool call

Sync usage

Additional config

Tool tags

End-user / actor configuration

Misc options