Skip to main content
For agents built without a directly supported framework, handlebar-core provides the primitives to wire governance into any agent loop.

Prerequisites

Install

pip install handlebar-core

How it fits into your agent

A typical agent loop looks like this:
start
  └─ [LLM call] → [tool call] → [LLM call] → [tool call] → ...
end
Handlebar wraps each step:
start_run
  └─ before_llm → [LLM call] → after_llm
  └─ before_tool → (allow / block) → [tool call] → after_tool
  └─ before_llm → [LLM call] → after_llm
  └─ ...
end_run
  • start_run / end_run — bracket the entire agent invocation. Handlebar uses these to group all events and enforce run-level budgets.
  • before_llm / after_llm — record the messages sent to the model and the response (including token usage) for the audit log.
  • before_tool — evaluate the proposed tool call against your configured policies. Returns a Decision that tells you whether to proceed, block, or stop the run.
  • after_tool — record the tool’s result.

The client

HandlebarClient is the top-level object that connects your agent to the Handlebar platform. It handles authentication, registers your agent, and manages the audit event stream. Initialise it once at application startup — not once per request or per run. The client is designed to be shared: it maintains a single connection to the platform and fans out events for all concurrent runs.
from handlebar.core import HandlebarClient, HandlebarClientConfig, AgentDescriptor

config = HandlebarClientConfig(
    agent=AgentDescriptor(slug="my-agent"),
    # HANDLEBAR_API_KEY is read from the environment if api_key is not set here.
)

client = await HandlebarClient.init(config)
For sync code, use init_sync:
client = HandlebarClient.init_sync(config)
AgentDescriptor.slug is the stable identifier for your agent on the Handlebar platform. It is how you match runs in the dashboard to a specific agent in your codebase. Use a short, unique, URL-safe string — e.g. "travel-booking-agent".

Runs

A run represents one end-to-end invocation of your agent — from the first message in to the final response out. Everything that happens in between (LLM calls, tool calls, decisions) is attributed to that run in the audit log. Start a new run at the beginning of each invocation and end it when the agent finishes:
from handlebar.core import RunConfig

run = await client.start_run(RunConfig(run_id="unique-run-id"))

# ... agent loop ...

await run.end("success")   # "success" | "error" | "interrupted" | "timeout"
run_id must be unique per invocation. A UUID works well:
from uuid import uuid4

run = await client.start_run(RunConfig(run_id=str(uuid4())))
The run object is what you interact with throughout the agent loop. Hold a reference to it for the lifetime of the invocation — typically as a local variable in the function that runs your agent.

Agent lifecycle

Before LLM call

run.before_llm(messages)
Call this before each call to the language model, passing the messages being sent. It emits message.raw.created audit events for each message, enabling full conversation logging on the platform.
from handlebar.core import LLMMessage, LLMResponse, ModelInfo, TokenUsage

# Before — pass the messages you are about to send.
messages = [
    LLMMessage(role="system", content="You are a helpful assistant."),
    LLMMessage(role="user",   content="Book me a flight to Paris."),
]
await run.before_llm(messages)

# ... call your LLM here ...
llm_output = await my_llm.chat(messages)
before_llm is optional - skipping it means conversation content won’t appear in audit logs, but tool governance still works fully.

After LLM call

run.after_llm(response)
Call this after the LLM responds. It logs the response content, records token usage for cost tracking, and emits an llm.result event.
llm_output = await my_llm.chat(messages)

await run.after_llm(
    LLMResponse(
        model=ModelInfo(name="gpt-4o", provider="openai"),
        content=[{"type": "text", "text": llm_output.text}],
        output_text=llm_output.text,
        usage=TokenUsage(
            input_tokens=llm_output.usage.prompt_tokens,
            output_tokens=llm_output.usage.completion_tokens,
        ),
    )
)
Like before_llm, this is optional but enables token-based budget enforcement and spend tracking on the platform.

Before tool call

run.before_tool
before_tool is where governance actually happens. It evaluates the proposed call against your configured rules. and returns a Decision indicating whether to allow or block the call.
from handlebar.core.schema.governance import RunControl, Verdict

decision = await run.before_tool(
    tool_name="book_flight",
    args={"destination": "Paris", "date": "2025-06-01"},
)

if decision.control == RunControl.TERMINATE:
    # A policy has determined the run should stop entirely.
    # Block this tool and do not make any further LLM or tool calls.
    tool_result = {"error": f"Blocked: {decision.message}"}
    await run.end("interrupted")

elif decision.verdict == Verdict.BLOCK:
    # This specific tool call is not allowed.
    # Return an error to the agent so it can respond gracefully.
    tool_result = {"error": f"Blocked: {decision.message}"}

else:
    # Allowed — execute the tool.
    tool_result = await book_flight(destination="Paris", date="2025-06-01")
The Decision shape:
class Decision(_Base):
    verdict: Verdict
    control: RunControl
    cause: DecisionCause
    message: str
    evaluated_rules: list[RuleEval] = Field(default_factory=list)
    final_rule_id: str | None = None
BLOCK + CONTINUE means the tool should be skipped but the agent loop can continue - the blocked message is typically returned to the LLM so it can respond gracefully. BLOCK + TERMINATE means the run should stop entirely. Throw an error that propagates up through your agent loop, catch it at the top level, and call run.end("interrupted").

After tool call

run.after_tool
Call this after every tool invocation, regardless of success or failure. It logs the result and evaluates any tool.after rules (e.g. inspecting output content or checking data exfiltration patterns).
tool_result = await tool_function_in_code(tool_args)

# Always call after_tool, even when the tool was blocked.
await run.after_tool(
    tool_name="book_flight",
    args={"destination": "Paris", "date": "2025-06-01"},
    result=tool_result,
)

Sync usage

Every async method has a _sync counterpart:
client   = HandlebarClient.init_sync(config)
run      = client.start_run_sync(RunConfig(run_id=str(uuid4())))
decision = run.before_tool_sync("book_flight", {"destination": "Paris"})
run.end_sync("success")

Additional config

Tool tags

Tags let you group tools by function so governance rules apply to the whole category rather than individual tool names. For example:
  • Rate-limit all "search" tools regardless of which search API they call
  • Block data exfiltration by preventing a "pii-read" result flowing into an "external" tool
  • Require human review before any "write" action
Register tags when you configure the client, and Handlebar will look them up automatically each time before_tool is called:
from handlebar.core import HandlebarClientConfig, AgentDescriptor, Tool

config = HandlebarClientConfig(
    agent=AgentDescriptor(slug="my-agent"),
    tools=[
        Tool(name="book_flight",  tags=["travel", "write", "external-comms"]),
        Tool(name="read_pii",     tags=["pii-read", "sensitive"]),
        Tool(name="send_email",   tags=["external-comms", "write"]),
        Tool(name="search_web",   tags=["search", "read"]),
    ],
)
If the tag set for a tool varies per invocation, pass tags directly at call time instead:
decision = await run.before_tool(
    tool_name="export_report",
    args={"format": "csv"},
    tool_tags=["export", "external-comms"],
)

End-user / actor configuration

Attach the identity of the end-user to the run so that Handlebar can:
  • Enforce per-user budgets — e.g. each user gets a fixed number of tool calls or tokens per day
  • Attribute every audit event to a specific user
  • Apply per-user rules — e.g. restrict which tools a given user role can invoke
Pass the actor when starting the run:
from handlebar.core import Actor, RunConfig

run = await client.start_run(
    RunConfig(
        run_id=str(uuid4()),
        actor=Actor(external_id="user_abc123"),   # your app's identifier for this user
        session_id="session_xyz",                  # groups runs from the same conversation
    )
)
external_id can be any stable identifier your application uses — a database ID, a UUID, an email address, etc. Handlebar stores it opaquely and uses it to aggregate usage and enforce per-user policies. session_id is optional but recommended when an agent can be invoked multiple times within a single user session — it lets the platform group related runs together.

Misc options

from handlebar.core import HandlebarClientConfig, AgentDescriptor, ConsoleSinkConfig

config = HandlebarClientConfig(
    agent=AgentDescriptor(
        slug="my-agent",
        name="My Agent",
        description="Books travel and manages calendars",
    ),
    enforce_mode="enforce",   # "enforce" | "shadow" | "off"
    fail_closed=False,        # True = block all tool calls if Handlebar API is unreachable
)
enforce_modeBehaviour
"enforce"Governance decisions are applied — blocked tools are stopped
"shadow"Decisions are evaluated and logged but never enforced
"off"No API calls; pass-through only
Last modified on March 2, 2026