Kohaku-Lab Ecosystem Architecture Analysis and Agent Observability Research

Background

Kohaku-Lab is a virtual open-source AI research lab led by KohakuBlueleaf (Yeh Shih-Ying, Taiwan). Its core project, KohakuTerrarium, is a multi-agent orchestration framework positioned similarly to LangGraph/CrewAI, but with a stronger emphasis on pluginization and runtime composability.

This article is a preliminary research piece for contributing an OpenTelemetry observability plugin, mapping out the ecosystem architecture, plugin mechanisms, and comparing observability practices across major agent frameworks.

Last updated: May 2026 (based on KohakuTerrarium v1.4.0, kt-biome v1.2.0)

1. Ecosystem Overview

The Kohaku-Lab organization has 6 repositories, forming a dependency chain centered on KohakuTerrarium:

graph TB
    KT["KohakuTerrarium v1.4.0"]
    KR["KohakuRAG"]
    KB["kt-biome v1.2.0"]
    KTe["kt-template"]
    KV["KohakuVault"]

    KT -->|depends on| KV
    KB -->|kt install| KT
    KR -->|depends on| KT

    style KT fill:#e1f5fe
    style KB fill:#e8f5e9

Project	Purpose	Language	Relationship to Core
KohakuTerrarium	Multi-agent orchestration framework	Python 3.10+	The core itself
KohakuRAG	Layered RAG engine	Python	Depends on core framework
kt-biome	Official plugin pack	Python 3.12+	Zero dependencies, runtime-mounted
KohakuVault	SQLite KV store	Python	Pip dependency of the core

Key constraint: kt-biome declares zero runtime dependencies (no pip dependencies in pyproject.toml). It installs into the framework via kt install and directly inherits the host framework's capabilities at runtime.

2. KohakuTerrarium Core Architecture (v1.4.0)

2.1 Overall Layering

graph TB
    subgraph Surfaces["Runtime Surfaces"]
        CLI["CLI"]
        TUI["TUI"]
        WEB["Web Dashboard"]
        DESK["Desktop"]
    end

    subgraph Studio["studio/"]
        CAT["Catalog"]
        SESS["Sessions"]
        PERS["Persistence"]
    end

    subgraph Core["core/"]
        AGENT["Creature"]
        CTRL["Controller"]
        EXEC["Executor"]
        CH["Channel"]
        COMPACT["Compact"]
    end

    subgraph Mods["modules/ 7 protocols"]
        PLUG["Plugin"]
        TOOL["Tool"]
        SUB["SubAgent"]
        UCMD["UserCommand"]
    end

    subgraph Terr["terrarium/"]
        TR["Runtime"]
        OBS["Observer"]
        GROUP["Tool Groups"]
    end

    CLI --> Studio
    WEB --> Studio
    Studio --> TR
    TR --> AGENT
    TR --> GROUP
    AGENT --> PLUG
    CTRL --> TOOL
    CTRL --> SUB

    style PLUG fill:#e8f5e9
    style Studio fill:#fff3e0

v1.4.0 introduced several important new modules:

Module	Purpose	Description
studio/	Programmatic API layer	Catalog / Identity / Sessions / Persistence / Editors — wraps the Terrarium engine
builtin_skills/	Built-in skill system	10 subagent skills + 28 tool skills, Markdown-defined prompt engineering
parsing/	Structured parsing	Events / Format / Patterns / State Machine
prompt/	Prompt engineering	Aggregator / FrameworkHints / SkillLoader / Template / ToolContributions
serving/	Serving infrastructure	process_metrics + web serving
sandbox/	Sandboxed execution	Config / Parse / Presets / Profile / Violations
user_command/	User command protocol	The 7th module protocol (Slash Commands)

2.2 Composition Algebra

The framework provides 4 composition primitives for assembling Creatures into complex workflows:

graph LR
    A["Sequential"] --> B["Parallel"]
    B --> C["Fan-out"]
    C --> D["Repeat"]

Operator	Semantics	Example
`>>`	Sequential execution	researcher >> writer
`&`	Parallel execution	analyst_a & analyst_b
`\|`	Fan-out to multiple	dispatcher \| [a, b, c]
`*`	Repeat N times	reviewer * 3

2.3 Parallel Execution Model

A three-layer async parallel mechanism:

& operator: asyncio.gather(*branches) — runs all branches in parallel, latency determined by the slowest branch
SubAgentManager: spawn() creates asyncio.Task, wait_all() waits for all to complete (Promise.all semantics)
Terrarium Runtime: Each Creature gets its own asyncio.Task, output wiring uses fire-and-forget dispatch

Execution model: single-threaded asyncio event loop (cooperative concurrency), no OS threads or multiprocessing needed. LLM calls achieve true I/O concurrency.

2.4 Plugin Hook Lifecycle

This is the primary integration point for the OTEL plugin. In v1.4.0, PluginHook expanded from 11 to 15 hooks:

sequenceDiagram
    participant T as Terrarium
    participant O as OTEL Plugin

    T->>O: on_load
    T->>O: on_agent_start
    T->>O: pre_llm_call
    T->>O: post_llm_call
    T->>O: pre_tool_dispatch
    T->>O: pre_tool_execute
    T->>O: post_tool_execute
    T->>O: pre_subagent_run
    T->>O: post_subagent_run
    T->>O: on_compact_start
    T->>O: on_compact_end
    T->>O: on_event
    T->>O: on_interrupt
    T->>O: on_task_promoted
    T->>O: on_agent_stop
    T->>O: on_unload

New hooks:

on_load / on_unload — Plugin lifecycle management
pre_tool_dispatch — Pre-dispatch interception for tools (useful for routing control)
on_compact_start / on_compact_end — Non-blocking context compression events (new Compact system in v1.4.0)

Existing reference implementations:

event_logger.py — JSONL structured logging, mounts 8 hooks, records token usage
cost_tracker.py — Token cost accumulation + budget threshold alerts
otel_metrics.py — Added in v1.2.0, OpenTelemetry Metrics + Tracing (see below)

3. kt-biome Plugin Pack Structure (v1.2.0)

kt-biome is the framework's official extension pack and the target repository for the OTEL plugin (confirmed by maintainers on Issue #45).

graph TB
    Y["kohaku.yaml"] --> C["creatures x10"]
    Y --> T["terrariums x5"]
    Y --> P["plugins x12"]
    Y --> TL["tools x4"]
    Y --> IO["io x4"]
    Y --> TG["triggers x3"]
    Y --> S["skills x3"]

    C --> G["general"]
    G --> SWE["swe"]
    G --> RES["researcher"]
    G --> BG["bounded_general"]

    style P fill:#e8f5e9

v1.2.0 updates:

Category	Change	New Items
Creatures	9 → 10	bounded_general (max_iterations: 50)
Plugins	11 → 12	otel_metrics, multimodal_guard, seamless_memory, termination_goal, pev_verifier, context_files, family_guidance
Tools	Unchanged	bash_docker (container execution), bash_ssh (remote execution)
I/O	5 → 4	telegram (input + output)
Triggers	Unchanged	cron (full cron expression support)
Skills	Updated	git-commit-flow, pdf-merge, todo-file

Creature inheritance: general is the base (24 tools / 6 sub-agents); swe, researcher, etc. inherit and extend it. bounded_general adds an iteration cap.

Terrarium topology example (swe_team):

graph LR
    ROOT["root"] -->|dispatch| TASKS["tasks"]
    TASKS -->|assign| SWE["swe"]
    SWE -->|output_wiring| REV["reviewer"]
    REV -->|collect| RESULTS["results"]

3.1 Existing OTEL Plugin: otel_metrics.py

kt-biome v1.2.0 already includes an OpenTelemetry plugin implementation (kt_biome/plugins/otel_metrics.py). Key design decisions:

Metrics: 16 Counters + 7 Histograms, DELTA temporality
Tracing: Optional, exported via OTLP HTTP
Lazy loading: try: from opentelemetry... except ImportError — no hard dependency
Configuration: kohaku.yaml options (service_name, endpoint, export_interval)

Metric Type	Example Names	Purpose
Counter	kt.llm.calls / kt.llm.tokens / kt.tool / kt.subagent / kt.compact	Call counts
Histogram	kt.llm.duration / kt.tool.duration / kt.agent.session.duration	Latency distribution

This means the basic OTEL support from Issue #45 has already been implemented. Future contribution directions include: adopting OTel GenAI gen_ai.* semantic conventions, enhancing tracing, and integrating with observability platforms.

4. Agent Observability: Industry Consensus

The OTEL plugin design isn't built from scratch — the industry has already formed clear consensus. Here's a comparison of observability practices across major frameworks.

4.1 Framework Architecture Comparison

Framework	Agent Loop Model	Sub-agents	Observability Approach	Language
Claude Code	Async Generator state machine	Recursive query(), depth ≤ 1	Native OTel (Traces + Metrics + Logs)	TypeScript
OpenAI Agents SDK	ReAct Loop	Handoff mechanism	Built-in TraceProvider, auto-collection	Python
AutoGen (Microsoft)	Message-passing loop	Multi-agent conversation	Native OTel, uses gen_ai.* directly	Python
CrewAI	Sequential/Hierarchical	Task delegation	EventBus Listener + OTel Exporter	Python
OpenCode (sst)	ReAct for loop	Tool-based, synchronous blocking	PubSub Broker + JSON debug logs	Go
KohakuTerrarium	Creature Controller Loop	Vertical + horizontal SubAgent	Plugin Hooks + Observer + otel_metrics	Python

4.2 Representative Architecture Deep Dives

Claude Code: State Machine + Native OTel

Claude Code's core is an approximately 1700-line async generator function query.ts. Every interaction goes through the same code path, and each loop iteration reconstructs an immutable state.

graph TB
    INPUT["User Input"] --> LOOP["Agent Loop"]
    LOOP --> LLM["LLM Call"]
    LLM --> PARSE{"Response?"}
    PARSE -->|tool_use| TOOL["Tool Execute"]
    PARSE -->|end_turn| OUTPUT["Output"]
    TOOL --> LOOP

    subgraph otel["OTel Spans"]
        SP1["interaction"] --> SP2["llm_request"]
        SP1 --> SP3["tool"]
    end

Three aspects of its observability design are particularly noteworthy:

Three-signal approach: Metrics (token/cost), Log Events (structured logs per prompt/tool/error), Traces (Beta, span hierarchy: interaction → llm_request → tool → hook)
W3C Trace Context propagation: Child processes inherit trace context, enabling cross-process distributed tracing
Hook system: 9 event types (PreToolUse / PostToolUse / SubagentStop, etc.), where hooks themselves are external processes (shell/HTTP/LLM)

Known limitation: Hook execution itself is not covered by OTel tracing (Issues #44983, #50287).

OpenCode: ReAct + PubSub

OpenCode (archived, migrated to charmbracelet/crush) uses a classic ReAct loop. Sub-agents exist as Tools with synchronous blocking calls and read-only access.

graph TB
    INPUT["User Input"] --> FOR["Agent Loop"]
    FOR --> STREAM["Stream LLM"]
    STREAM --> CHECK{"tool_use?"}
    CHECK -->|Yes| EXEC["Execute Tool"]
    CHECK -->|No| EXIT["Return"]
    EXEC --> FOR

    subgraph obs["Observability"]
        BROKER["Broker PubSub"]
        SLOG["Structured Log"]
        JSON["JSON Traces"]
    end

The four agent types (coder / summarizer / task / title) are flat string constants, not a hierarchical structure. Observability relies on a Go generics-based Broker[T] event bus + structured logging, with no OTel integration.

4.3 OpenTelemetry GenAI Semantic Conventions

OTel has released experimental GenAI semantic conventions (the gen_ai.* namespace), which are becoming the de facto standard:

Span Type	Name	Purpose
Agent	`invoke_agent`	One span per agent invocation
LLM	`chat`	One span per model call
Tool	`execute_tool`	One span per tool execution

Core attributes:

gen_ai.operation.name — Operation type
gen_ai.request.model — Model identifier
gen_ai.agent.name / id — Agent identifier
gen_ai.usage.input/output_tokens — Token usage

Core metrics:

gen_ai.client.token.usage (Histogram) — Token consumption
gen_ai.client.operation.duration (Histogram) — Operation duration

4.4 Agent-as-Trace Pattern (Emerging Consensus)

Frameworks are converging on the same pattern:

sequenceDiagram
    participant User
    participant Agent
    participant LLM
    participant Tool

    User->>Agent: invoke
    Note over Agent: span: invoke_agent

    Agent->>LLM: chat
    Note over LLM: span: gen_ai.chat
    LLM-->>Agent: response + tool_use

    Agent->>Tool: execute
    Note over Tool: span: execute_tool
    Tool-->>Agent: result

    Agent->>LLM: chat with result
    Note over LLM: span: gen_ai.chat
    LLM-->>Agent: final answer

    Agent-->>User: output

A single agent invocation = one distributed Trace containing multiple nested Spans (Agent → LLM → Tool → LLM → ...). Token usage is recorded at the LLM Span level and aggregated at the Trace level.

4.5 Observability Platform Comparison

Platform	Protocol	Data Model	OTel Relationship
Langfuse	Native OTLP	Trace → Observation (7 types)	Direct integration
Phoenix (Arize)	OTLP + OpenInference	Span-based	Translation layer: GenAI → OpenInference
LangSmith	Custom + OTel export	@traceable decorator	One-way export, has context propagation gaps
W&B Weave	Custom + OTel export	@weave.op, auto-patches 30+ providers	Export-compatible

The two competing semantic convention sets (OTel gen_ai.* vs OpenInference openinference.*) are converging — Phoenix has built a translation layer, and Langfuse accepts OTLP directly.

5. Key Takeaways

Dimension	Conclusion
Basic OTEL support	Already implemented (otel_metrics.py) — Metrics + optional Tracing
Integration approach	Implement `PluginBase` + register 15 `PluginHook` callbacks
Reference implementations	otel_metrics.py (OTEL) and event_logger.py (JSONL logging)
Dependency constraint	Lazy-loading pattern — OTEL SDK is an optional dependency
Data sources	15 PluginHooks — sufficient for building complete Traces + Metrics
Semantic standards	Should adopt OTel GenAI `gen_ai.` semantic conventions (current implementation uses kt. prefix)
Architecture pattern	Agent-as-Trace (industry consensus)

Future contribution directions:

Evaluate gaps between the existing otel_metrics.py implementation and OTel GenAI semantic conventions
Consider migrating from kt.* prefix to the standard gen_ai.* prefix
Enhance Tracing capabilities: complete Span chains (Agent → LLM → Tool → SubAgent)
Integrate with observability platforms (Langfuse / Phoenix / Jaeger, etc.)
Validate end-to-end trace chain integrity on the swe_team terrarium