Background

Kohaku-Lab is a virtual open-source AI research lab led by KohakuBlueleaf (Yeh Shih-Ying, Taiwan). Its core project, KohakuTerrarium, is a multi-agent orchestration framework positioned similarly to LangGraph/CrewAI, but with a stronger emphasis on pluginization and runtime composability.

This article is a preliminary research piece for contributing an OpenTelemetry observability plugin, mapping out the ecosystem architecture, plugin mechanisms, and comparing observability practices across major agent frameworks.

Last updated: May 2026 (based on KohakuTerrarium v1.4.0, kt-biome v1.2.0)

1. Ecosystem Overview

The Kohaku-Lab organization has 6 repositories, forming a dependency chain centered on KohakuTerrarium:

graph TB
    KT["KohakuTerrarium v1.4.0"]
    KR["KohakuRAG"]
    KB["kt-biome v1.2.0"]
    KTe["kt-template"]
    KV["KohakuVault"]

    KT -->|depends on| KV
    KB -->|kt install| KT
    KR -->|depends on| KT

    style KT fill:#e1f5fe
    style KB fill:#e8f5e9
ProjectPurposeLanguageRelationship to Core
KohakuTerrariumMulti-agent orchestration frameworkPython 3.10+The core itself
KohakuRAGLayered RAG enginePythonDepends on core framework
kt-biomeOfficial plugin packPython 3.12+Zero dependencies, runtime-mounted
KohakuVaultSQLite KV storePythonPip dependency of the core

Key constraint: kt-biome declares zero runtime dependencies (no pip dependencies in pyproject.toml). It installs into the framework via kt install and directly inherits the host framework's capabilities at runtime.

2. KohakuTerrarium Core Architecture (v1.4.0)

2.1 Overall Layering

graph TB
    subgraph Surfaces["Runtime Surfaces"]
        CLI["CLI"]
        TUI["TUI"]
        WEB["Web Dashboard"]
        DESK["Desktop"]
    end

    subgraph Studio["studio/"]
        CAT["Catalog"]
        SESS["Sessions"]
        PERS["Persistence"]
    end

    subgraph Core["core/"]
        AGENT["Creature"]
        CTRL["Controller"]
        EXEC["Executor"]
        CH["Channel"]
        COMPACT["Compact"]
    end

    subgraph Mods["modules/ 7 protocols"]
        PLUG["Plugin"]
        TOOL["Tool"]
        SUB["SubAgent"]
        UCMD["UserCommand"]
    end

    subgraph Terr["terrarium/"]
        TR["Runtime"]
        OBS["Observer"]
        GROUP["Tool Groups"]
    end

    CLI --> Studio
    WEB --> Studio
    Studio --> TR
    TR --> AGENT
    TR --> GROUP
    AGENT --> PLUG
    CTRL --> TOOL
    CTRL --> SUB

    style PLUG fill:#e8f5e9
    style Studio fill:#fff3e0

v1.4.0 introduced several important new modules:

ModulePurposeDescription
studio/Programmatic API layerCatalog / Identity / Sessions / Persistence / Editors — wraps the Terrarium engine
builtin_skills/Built-in skill system10 subagent skills + 28 tool skills, Markdown-defined prompt engineering
parsing/Structured parsingEvents / Format / Patterns / State Machine
prompt/Prompt engineeringAggregator / FrameworkHints / SkillLoader / Template / ToolContributions
serving/Serving infrastructureprocess_metrics + web serving
sandbox/Sandboxed executionConfig / Parse / Presets / Profile / Violations
user_command/User command protocolThe 7th module protocol (Slash Commands)

2.2 Composition Algebra

The framework provides 4 composition primitives for assembling Creatures into complex workflows:

graph LR
    A["Sequential"] --> B["Parallel"]
    B --> C["Fan-out"]
    C --> D["Repeat"]
OperatorSemanticsExample
>>Sequential executionresearcher >> writer
&Parallel executionanalyst_a & analyst_b
|Fan-out to multipledispatcher | [a, b, c]
*Repeat N timesreviewer * 3

2.3 Parallel Execution Model

A three-layer async parallel mechanism:

  • & operator: asyncio.gather(*branches) — runs all branches in parallel, latency determined by the slowest branch
  • SubAgentManager: spawn() creates asyncio.Task, wait_all() waits for all to complete (Promise.all semantics)
  • Terrarium Runtime: Each Creature gets its own asyncio.Task, output wiring uses fire-and-forget dispatch

Execution model: single-threaded asyncio event loop (cooperative concurrency), no OS threads or multiprocessing needed. LLM calls achieve true I/O concurrency.

2.4 Plugin Hook Lifecycle

This is the primary integration point for the OTEL plugin. In v1.4.0, PluginHook expanded from 11 to 15 hooks:

sequenceDiagram
    participant T as Terrarium
    participant O as OTEL Plugin

    T->>O: on_load
    T->>O: on_agent_start
    T->>O: pre_llm_call
    T->>O: post_llm_call
    T->>O: pre_tool_dispatch
    T->>O: pre_tool_execute
    T->>O: post_tool_execute
    T->>O: pre_subagent_run
    T->>O: post_subagent_run
    T->>O: on_compact_start
    T->>O: on_compact_end
    T->>O: on_event
    T->>O: on_interrupt
    T->>O: on_task_promoted
    T->>O: on_agent_stop
    T->>O: on_unload

New hooks:

  • on_load / on_unload — Plugin lifecycle management
  • pre_tool_dispatch — Pre-dispatch interception for tools (useful for routing control)
  • on_compact_start / on_compact_end — Non-blocking context compression events (new Compact system in v1.4.0)

Existing reference implementations:

  • event_logger.py — JSONL structured logging, mounts 8 hooks, records token usage
  • cost_tracker.py — Token cost accumulation + budget threshold alerts
  • otel_metrics.py — Added in v1.2.0, OpenTelemetry Metrics + Tracing (see below)

3. kt-biome Plugin Pack Structure (v1.2.0)

kt-biome is the framework's official extension pack and the target repository for the OTEL plugin (confirmed by maintainers on Issue #45).

graph TB
    Y["kohaku.yaml"] --> C["creatures x10"]
    Y --> T["terrariums x5"]
    Y --> P["plugins x12"]
    Y --> TL["tools x4"]
    Y --> IO["io x4"]
    Y --> TG["triggers x3"]
    Y --> S["skills x3"]

    C --> G["general"]
    G --> SWE["swe"]
    G --> RES["researcher"]
    G --> BG["bounded_general"]

    style P fill:#e8f5e9

v1.2.0 updates:

CategoryChangeNew Items
Creatures9 → 10bounded_general (max_iterations: 50)
Plugins11 → 12otel_metrics, multimodal_guard, seamless_memory, termination_goal, pev_verifier, context_files, family_guidance
ToolsUnchangedbash_docker (container execution), bash_ssh (remote execution)
I/O5 → 4telegram (input + output)
TriggersUnchangedcron (full cron expression support)
SkillsUpdatedgit-commit-flow, pdf-merge, todo-file

Creature inheritance: general is the base (24 tools / 6 sub-agents); swe, researcher, etc. inherit and extend it. bounded_general adds an iteration cap.

Terrarium topology example (swe_team):

graph LR
    ROOT["root"] -->|dispatch| TASKS["tasks"]
    TASKS -->|assign| SWE["swe"]
    SWE -->|output_wiring| REV["reviewer"]
    REV -->|collect| RESULTS["results"]

3.1 Existing OTEL Plugin: otel_metrics.py

kt-biome v1.2.0 already includes an OpenTelemetry plugin implementation (kt_biome/plugins/otel_metrics.py). Key design decisions:

  • Metrics: 16 Counters + 7 Histograms, DELTA temporality
  • Tracing: Optional, exported via OTLP HTTP
  • Lazy loading: try: from opentelemetry... except ImportError — no hard dependency
  • Configuration: kohaku.yaml options (service_name, endpoint, export_interval)
Metric TypeExample NamesPurpose
Counterkt.llm.calls / kt.llm.tokens / kt.tool / kt.subagent / kt.compactCall counts
Histogramkt.llm.duration / kt.tool.duration / kt.agent.session.durationLatency distribution

This means the basic OTEL support from Issue #45 has already been implemented. Future contribution directions include: adopting OTel GenAI gen_ai.* semantic conventions, enhancing tracing, and integrating with observability platforms.

4. Agent Observability: Industry Consensus

The OTEL plugin design isn't built from scratch — the industry has already formed clear consensus. Here's a comparison of observability practices across major frameworks.

4.1 Framework Architecture Comparison

FrameworkAgent Loop ModelSub-agentsObservability ApproachLanguage
Claude CodeAsync Generator state machineRecursive query(), depth ≤ 1Native OTel (Traces + Metrics + Logs)TypeScript
OpenAI Agents SDKReAct LoopHandoff mechanismBuilt-in TraceProvider, auto-collectionPython
AutoGen (Microsoft)Message-passing loopMulti-agent conversationNative OTel, uses gen_ai.* directlyPython
CrewAISequential/HierarchicalTask delegationEventBus Listener + OTel ExporterPython
OpenCode (sst)ReAct for loopTool-based, synchronous blockingPubSub Broker + JSON debug logsGo
KohakuTerrariumCreature Controller LoopVertical + horizontal SubAgentPlugin Hooks + Observer + otel_metricsPython

4.2 Representative Architecture Deep Dives

Claude Code: State Machine + Native OTel

Claude Code's core is an approximately 1700-line async generator function query.ts. Every interaction goes through the same code path, and each loop iteration reconstructs an immutable state.

graph TB
    INPUT["User Input"] --> LOOP["Agent Loop"]
    LOOP --> LLM["LLM Call"]
    LLM --> PARSE{"Response?"}
    PARSE -->|tool_use| TOOL["Tool Execute"]
    PARSE -->|end_turn| OUTPUT["Output"]
    TOOL --> LOOP

    subgraph otel["OTel Spans"]
        SP1["interaction"] --> SP2["llm_request"]
        SP1 --> SP3["tool"]
    end

Three aspects of its observability design are particularly noteworthy:

  • Three-signal approach: Metrics (token/cost), Log Events (structured logs per prompt/tool/error), Traces (Beta, span hierarchy: interaction → llm_request → tool → hook)
  • W3C Trace Context propagation: Child processes inherit trace context, enabling cross-process distributed tracing
  • Hook system: 9 event types (PreToolUse / PostToolUse / SubagentStop, etc.), where hooks themselves are external processes (shell/HTTP/LLM)

Known limitation: Hook execution itself is not covered by OTel tracing (Issues #44983, #50287).

OpenCode: ReAct + PubSub

OpenCode (archived, migrated to charmbracelet/crush) uses a classic ReAct loop. Sub-agents exist as Tools with synchronous blocking calls and read-only access.

graph TB
    INPUT["User Input"] --> FOR["Agent Loop"]
    FOR --> STREAM["Stream LLM"]
    STREAM --> CHECK{"tool_use?"}
    CHECK -->|Yes| EXEC["Execute Tool"]
    CHECK -->|No| EXIT["Return"]
    EXEC --> FOR

    subgraph obs["Observability"]
        BROKER["Broker PubSub"]
        SLOG["Structured Log"]
        JSON["JSON Traces"]
    end

The four agent types (coder / summarizer / task / title) are flat string constants, not a hierarchical structure. Observability relies on a Go generics-based Broker[T] event bus + structured logging, with no OTel integration.

4.3 OpenTelemetry GenAI Semantic Conventions

OTel has released experimental GenAI semantic conventions (the gen_ai.* namespace), which are becoming the de facto standard:

Span TypeNamePurpose
Agentinvoke_agentOne span per agent invocation
LLMchatOne span per model call
Toolexecute_toolOne span per tool execution

Core attributes:

  • gen_ai.operation.name — Operation type
  • gen_ai.request.model — Model identifier
  • gen_ai.agent.name / id — Agent identifier
  • gen_ai.usage.input/output_tokens — Token usage

Core metrics:

  • gen_ai.client.token.usage (Histogram) — Token consumption
  • gen_ai.client.operation.duration (Histogram) — Operation duration

4.4 Agent-as-Trace Pattern (Emerging Consensus)

Frameworks are converging on the same pattern:

sequenceDiagram
    participant User
    participant Agent
    participant LLM
    participant Tool

    User->>Agent: invoke
    Note over Agent: span: invoke_agent

    Agent->>LLM: chat
    Note over LLM: span: gen_ai.chat
    LLM-->>Agent: response + tool_use

    Agent->>Tool: execute
    Note over Tool: span: execute_tool
    Tool-->>Agent: result

    Agent->>LLM: chat with result
    Note over LLM: span: gen_ai.chat
    LLM-->>Agent: final answer

    Agent-->>User: output

A single agent invocation = one distributed Trace containing multiple nested Spans (Agent → LLM → Tool → LLM → ...). Token usage is recorded at the LLM Span level and aggregated at the Trace level.

4.5 Observability Platform Comparison

PlatformProtocolData ModelOTel Relationship
LangfuseNative OTLPTrace → Observation (7 types)Direct integration
Phoenix (Arize)OTLP + OpenInferenceSpan-basedTranslation layer: GenAI → OpenInference
LangSmithCustom + OTel export@traceable decoratorOne-way export, has context propagation gaps
W&B WeaveCustom + OTel export@weave.op, auto-patches 30+ providersExport-compatible

The two competing semantic convention sets (OTel gen_ai.* vs OpenInference openinference.*) are converging — Phoenix has built a translation layer, and Langfuse accepts OTLP directly.

5. Key Takeaways

DimensionConclusion
Basic OTEL supportAlready implemented (otel_metrics.py) — Metrics + optional Tracing
Integration approachImplement PluginBase + register 15 PluginHook callbacks
Reference implementationsotel_metrics.py (OTEL) and event_logger.py (JSONL logging)
Dependency constraintLazy-loading pattern — OTEL SDK is an optional dependency
Data sources15 PluginHooks — sufficient for building complete Traces + Metrics
Semantic standardsShould adopt OTel GenAI gen_ai.* semantic conventions (current implementation uses kt.* prefix)
Architecture patternAgent-as-Trace (industry consensus)

Future contribution directions:

  1. Evaluate gaps between the existing otel_metrics.py implementation and OTel GenAI semantic conventions
  2. Consider migrating from kt.* prefix to the standard gen_ai.* prefix
  3. Enhance Tracing capabilities: complete Span chains (Agent → LLM → Tool → SubAgent)
  4. Integrate with observability platforms (Langfuse / Phoenix / Jaeger, etc.)
  5. Validate end-to-end trace chain integrity on the swe_team terrarium