Shahzad Bhatti Welcome to my ramblings and rants!

June 17, 2026

Building a Self-Improving AI Agent with Durable Actors: MiniHermes

Filed under: Agentic AI — admin @ 8:25 pm

What Is Hermes Agent?

Hermes Agent from Nous Research is very capable open agent that centers on three ideas that reinforce each other:

  • Structured system prompt with function-calling discipline. The system prompt teaches the model when to call a tool versus when to answer directly, how to format tool inputs as JSON, and how to interpret results and loop forward. The model learns that end_turn means the task is finished. This discipline makes Hermes far more reliable than agents running open-ended prompts.
  • Multi-step tool loop. After each LLM response, the agent checks: did the model request a tool? If yes, execute it, append the result, and call the LLM again up to a configured limit. This is what lets Hermes chain steps like “search –> read –> summarise” without the user driving each step by hand.
  • Self-critique and skill accumulation. After a complex task, Hermes reflects on the conversation and extracts a reusable skill, a named, structured description of the steps it took. The next time it encounters a similar request, it injects that skill into context and executes faster, without re-discovering the procedure from scratch.

These three properties make Hermes genuinely useful. But the reference implementation is a monolithic Python process. One crash loses every in-flight session. There is no distribution, no tenant isolation, no scheduled automation, and no provider failover. It is excellent research code and a fragile foundation for anything beyond a single-user demo.

MiniHermes keeps all three Hermes ideas and rebuilds the execution model on PlexSpaces, an actor-based distributed runtime. The result compiles to a single WASM binary, runs 12 actors under supervision, and adds durable state, fault isolation, distributed cron, context compression, and guardrails without changing how the core agent loop reasons.


The Problem: Stateless vs. Stateful Monolith

Most AI agents fall into one of two camps, and both have real problems.

  • Stateless agents are easy to deploy but forget everything between requests. You can’t reuse a procedure the agent learned last Tuesday. You can’t track that the user prefers metric units. Every conversation starts from zero. The workarounds like external caches, vector stores turn the agent into infrastructure glue rather than an intelligent system.
  • Stateful monoliths like the Hermes reference implementation go the other direction: one process owns everything. That’s clean for development, but fragile under load. When the process crashes, every active session vanishes. A bug in skill extraction can corrupt the memory that session management depends on.

The actor model offers a third path. Decompose the system into many small actors, each owning exactly one responsibility, communicating only through messages. When one crashes, the supervisor restarts just that actor. The others keep running.


PlexSpaces Primitives: The Foundation

Before walking through the actors, it helps to understand the primitives every actor has access to inside the WASM sandbox. These are the only operations available, no filesystem, no global state, no raw sockets. This constraint is deliberate: it is part of what makes the system auditable and safe.

KV: Durable Point Lookup

# Persist and restore session history across restarts
host.kv_put(f"session_history:{session_id}", json.dumps(messages))
raw = host.kv_get(f"session_history:{session_id}")
messages = json.loads(raw) if raw else []

KV stores anything keyed by an exact string: session history, skill metadata, cron job state, provider configuration. The durability facet checkpoints it automatically, so a restarted actor picks up exactly where it left off.

TupleSpace: Pattern-Matched Coordination

TupleSpace is not KV. Rather than point lookups, it supports wildcard queries:

# Index a skill under multiple trigger keywords
host.ts.write(["skill_trigger", "csv",         "skill-001"])
host.ts.write(["skill_trigger", "spreadsheet", "skill-001"])
host.ts.write(["skill_trigger", "pivot",       "skill-001"])

# Find every skill that might match — None is a wildcard
all_triggers = host.ts.read_all(["skill_trigger", None, None])
# ? [["skill_trigger","csv","skill-001"], ["skill_trigger","spreadsheet","skill-001"], ...]

# Audit log: all events of a specific type
events = host.ts.read_all(["audit", "tool_executed", None, None])

# Health snapshots: last N polls
snapshots = host.ts.read_all(["health_snapshot", None, None])

TupleSpace powers skill indexes, memory tiers, audit logs, and health snapshots, anything where you scan across many entries rather than fetching one by ID.

Design tradeoff. TupleSpace pattern matching scales well for hundreds to thousands of entries but is not a replacement for a vector database or SQL at large scale. For this POC it removes an external dependency entirely; a production system with millions of skills would add an embedding-based index alongside it.

BlobStorage: Large, Opaque Content

# Skill procedures can be several paragraphs — too large for KV values
host.blob.upload(f"skill_procedure_{skill_id}", procedure_text.encode())
procedure = host.blob.download(f"skill_procedure_{skill_id}").decode()

BlobStorage handles the full procedure text that would be awkward as a KV value and wasteful to pass in message payloads.

Channel: At-Least-Once Delivery

# Cron scheduler enqueues a job
host.channel.send("", "cron:pending", "cron_job", job_payload)

# Agent receives, processes, then acks — message redelivered if agent crashes before ack
msg, ok, _ = host.channel.receive("", "cron:pending", timeout_ms=5000)
if ok:
    # ... process the job ...
    host.channel.ack("", "cron:pending", msg["msg_id"])
    # or: host.channel.nack("", "cron:pending", msg["msg_id"], True)  # requeue

Channel provides the durability that host.send() does not. If the consuming actor crashes between receive and ack, the message is redelivered on restart. This is what makes recurring tasks survive node failures without a separate message broker.

DistributedLock: Cluster-Wide Leader Election

// Go — CronSchedulerActor.tick()
// TryAcquire returns false immediately if another node holds the lock
// TTL of 90s is longer than the 60s tick interval, preventing gaps
acquired, _ := host.Lock().TryAcquire("minihermes", "cron_leader", 90000)
if !acquired {
    return // another node is the leader this cycle
}
// Safe to fire jobs — only this node runs this block right now

Without DistributedLock, every node in a three-node cluster would fire every cron job simultaneously. The lock ensures exactly one leader schedules per tick.

SendAfter: Actor-Managed Timers

@init_handler
def on_init(self, config: dict) -> None:
    host.process_groups.join("svc:health_monitor")
    # Arm the first tick — no external cron daemon needed
    host.send_after(self.poll_interval_ms, "poll_tick", {"op": "poll_tick"})

@handler("poll_tick", "cast")
def poll_tick(self) -> None:
    # ... do poll work ...
    # Re-arm: each tick schedules the next
    host.send_after(self.poll_interval_ms, "poll_tick", {"op": "poll_tick"})

send_after replaces external schedulers for periodic work inside an actor. The actor manages its own timeline.

Ask vs. Send: Request-Reply vs. Fire-and-Forget

# host.ask() — blocks until a response arrives (or timeout)
llm_resp = host.ask(llm_id, "completion",
                    {"messages": messages, "tools": tools},
                    timeout_ms=30000)

# host.send() — returns immediately, caller never waits
host.send(audit_id, "log_event",
          {"event_type": "tool_executed", "detail": f"tool={name}"})

This distinction matters for latency. Audit events and async skill learning always use send(). The calling actor never waits for them. LLM completions and tool results use ask() because the outcome is needed before continuing.

IncrCounter: Lightweight Metrics

# Increment a named counter — visible to monitoring without any external metrics system
host.incr_counter("llm_completions_total", 1)
host.incr_counter("tool_executions_total", 1)
host.incr_counter(f"tool_{name}_total", 1)
host.incr_counter("skill_matches_total", len(matched_ids))

Every key operation in MiniHermes emits a counter. Aggregated across actors, these give a metrics dashboard without Prometheus or a separate telemetry pipeline.


Architecture: 12 Actors, One WASM Binary

MiniHermes compiles to a single WASM binary. The PlexSpaces supervisor boots 12 actors from it at startup, each with its own state, crash domain, and message contract.

The four actor behaviors map to four different runtime contracts:

BehaviorActorsWhat It Provides
GenServerAgent, LLM, Tools, Skills, Memory, Compressor, Cron, Session, HealthSynchronous request-reply with durable state
GenFSMGuardrailsGateValidated state machine — invalid transitions are rejected at runtime
GenEventAuditEventFire-and-forget event delivery; callers never block
WorkflowSkillExtractionWorkflowDurable multi-step execution with per-step checkpoints and cancel/query signals

Fault isolation. A bug in SkillStoreActor cannot corrupt AgentActor‘s session history. If SkillExtractionWorkflow crashes mid-extraction, it resumes from its last checkpoint without restarting the conversation. The one_for_one supervisor strategy restarts only the failed actor; everything else keeps running.

# app-config.toml
[supervisor]
strategy = "one_for_one"           # restart ONLY the crashed child
max_restarts = 10
max_restart_window_seconds = 60    # if 10 crashes in 60s, escalate to parent supervisor

Latency tradeoff. Each actor boundary costs one ask() call instead of an in-process function call. For an LLM agent this is negligible as LLM round-trips dominate at 100ms to 10s. The isolation and recoverability benefits far outweigh the sub-millisecond message overhead.


The Supervisor Tree and the Let-It-Crash Philosophy

Monolithic agent frameworks force every developer to write defensive error handling around every tool call, every LLM request, every memory write. MiniHermes takes the Erlang philosophy instead: let actors crash, and let supervisors restart them in a clean state.

When ToolExecutorActor crashes due to a bad tool payload, a timeout, or a WASM trap, the supervisor restarts it with clean state. The AgentActor‘s in-flight request receives a timeout error and can retry. Every other actor continues running. The audit trail, the cron scheduler, the skill store, the LLM gateway, none of them know a crash happened.

This is the opposite of a monolith, where one bad tool call can corrupt the process heap and take the entire agent down.


Security: WASM, Firecracker, and Actor Isolation

Security in MiniHermes comes from three concentric layers, not from application-level checks.

  • Layer 1 Actor message isolation. Each actor owns its state exclusively. No shared memory, no global variables. Communication happens only through host.ask() and host.send(). Even if a prompt injection tricks AgentActor into misbehaving, it cannot read LLMGatewayActor‘s stored API credentials or SkillStoreActor‘s procedure data as those live in separate actor state.
  • Layer 2 WASM linear memory sandbox. Every actor compiles to a WebAssembly module. The WIT (WebAssembly Interface Types) definition explicitly lists every operation the actor can call:
// wit/plexspaces-actor/host.wit
// Actors can ONLY call these imports — nothing else is accessible
interface host {
    send:       func(to: string, msg-type: string, payload: payload) -> result<_, actor-error>;
    ask:        func(to: string, msg-type: string, payload: payload, timeout-ms: u64) -> result<payload, actor-error>;
    kv-get:     func(key: string) -> result<payload, actor-error>;
    kv-put:     func(key: string, value: payload) -> result<_, actor-error>;
    http-fetch: func(link-name: string, method: string, path: string, request: payload) -> result<payload, actor-error>;
    ts-write:   func(tuple: list<string>) -> result<_, actor-error>;
    ts-read-all:func(pattern: list<option<string>>) -> result<list<list<string>>, actor-error>;
    // No filesystem. No env vars. No raw network. No process exec.
}

A malicious tool payload cannot exfiltrate environment variables or write to the filesystem because those syscalls do not exist in the WASM environment.

  • Layer 3 Firecracker. In a production deployment, each WASM runtime runs inside a Firecracker microVM, a lightweight KVM-based hypervisor that provides hardware-enforced memory and I/O isolation between tenants. A compromise in one tenant’s actor cannot affect another tenant’s data or execution even if the WASM sandbox were bypassed.

Tenant isolation. Every PlexSpaces operation propagates tenant context automatically. KV keys, TupleSpace tuples, process groups, and object registry entries are all scoped by tenant and namespace:

# Framework-enforced key scoping — no application code can bypass this
KV:          tenant-acme:prod:session_history:sess-001
TupleSpace:  tenant-acme:prod:["skill_trigger", "csv", "skill-001"]
PG:          tenant-acme:prod:svc:agent

Tenant acme cannot retrieve a session belonging to tenant globex. The framework rejects the request before it reaches any actor.


The Agent Loop

AgentActor drives the core conversation. When it receives a chat message, here is the full sequence:

User: "calculate 42 * 17 and remember the result"

  1. Restore session history from KV (survives restarts)
  2. Ask ContextCompressorActor: token budget > 75%?
     --> Yes: summarize the middle, keep the recent tail, archive original
  3. Ask SkillStoreActor: known procedures for "calculate" + "memory_store"?
     --> Found: inject skill into system prompt
  4. Ask ToolExecutorActor: list current tool schemas
  5. LOOP (max 8 iterations):
     a. Ask LLMGatewayActor: complete with these messages + tools
     b. stop_reason = tool_use:
        --> GuardrailsGate.check("calculator")   --> allow
        --> ToolExecutor.execute("calculator", {expr: "42*17"}) --> {result: 714}
        --> GuardrailsGate.check("memory_store") --> allow
        --> ToolExecutor.execute("memory_store", {key: "last_calc", value: "714"})
        --> Append results; continue loop
     c. stop_reason = end_turn --> break
  6. KV.put("session_history:sess-001", messages)   --  durable checkpoint
  7. send (fire-and-forget): SkillStoreActor.evaluate_for_learning
  8. send (fire-and-forget): AuditEventActor.log_event
  == "42 × 17 = 714. I've stored the result in your memory."

The Python implementation:

@actor
class AgentActor:
    system_prompt: str = state(default="You are a helpful AI assistant with access to tools.")
    messages: list     = state(default_factory=list)
    max_iterations: int = state(default=8)
    token_budget: int   = state(default=4096)

    @init_handler
    def on_init(self, config: dict) -> None:
        args = config.get("args", {})
        self.system_prompt = args.get("system_prompt", self.system_prompt)
        host.process_groups.join("svc:agent")
        # Publish capabilities for registry-based discovery
        host.registry.register(ctx="", object_type="actor", object_id=config["actor_id"],
                                object_category="agent",
                                capabilities=["chat", "tool_use", "memory"])

    @handler("chat")
    def chat(self, message: str = "", session_id: str = "") -> dict:
        # 1. Restore durable session
        if session_id:
            raw = host.kv_get(f"session_history:{session_id}")
            if raw:
                self.messages = json.loads(raw)
        self.messages.append({"role": "user", "content": message})

        # 2. Compress if over token budget
        comp_id, _ = pg_first("svc:context_compressor")
        if comp_id:
            resp = ask(comp_id, "check_and_compress",
                       {"messages": self.messages, "token_budget": self.token_budget})
            if resp and resp.get("compressed"):
                self.messages = resp["messages"]

        # 3. Inject matching skills
        skill_id, _ = pg_first("svc:skill_store")
        skill_context = ""
        if skill_id:
            resp = ask(skill_id, "match_skills", {"query": message})
            if resp and resp.get("skills"):
                skill_context = self._format_skills(resp["skills"])

        # 4. Get live tool schemas
        tool_exec_id, _ = pg_first("svc:tool_executor")
        tools = []
        if tool_exec_id:
            resp = ask(tool_exec_id, "list_tools", {})
            tools = resp.get("tools", []) if resp else []

        system = self.system_prompt
        if skill_context:
            system += f"\n\n## Relevant Skills\n{skill_context}"

        # 5. The tool loop — max_iterations prevents runaway execution
        final_response = ""
        for iteration in range(self.max_iterations):
            llm_id, _ = pg_first("svc:llm_gateway")
            llm_resp = ask(llm_id, "completion",
                           {"messages": [{"role": "system", "content": system}] + self.messages,
                            "tools": tools},
                           timeout_ms=30000)

            response    = llm_resp.get("response", {})
            stop_reason = response.get("stop_reason", "end_turn")
            self.messages.append({"role": "assistant",
                                   "content": response.get("content", ""),
                                   "stop_reason": stop_reason})

            if stop_reason == "end_turn":
                final_response = response.get("content", "")
                break

            if stop_reason == "tool_use":
                guard_id, _ = pg_first("svc:guardrails")
                for tc in response.get("tool_calls", []):
                    # Every tool call clears the guardrail first
                    if guard_id:
                        check = ask(guard_id, "check_tool",
                                    {"tool_name": tc["name"], "input": tc["input"]})
                        if check and check.get("decision") == "deny":
                            self.messages.append({"role": "tool",
                                                   "content": f"[denied: {tc['name']}]"})
                            continue
                    result = ask(tool_exec_id, "execute",
                                 {"name": tc["name"], "input": tc["input"]})
                    self.messages.append({"role": "tool",
                                          "tool_call_id": tc["id"],
                                          "content": json.dumps(result)})
                    host.send(audit_id, "log_event",
                              {"event_type": "tool_executed",
                               "detail": f"tool={tc['name']} session={session_id}"})
                    host.incr_counter("tool_executions_total", 1)

        # 6. Checkpoint session — durable across restarts
        if session_id:
            host.kv_put(f"session_history:{session_id}", json.dumps(self.messages))

        # 7+8. Async learning and audit — never block the response
        if skill_id:
            host.send(skill_id, "evaluate_for_learning",
                      {"messages": self.messages, "user_intent": message})
        host.incr_counter("agent_chats_total", 1)
        return {"status": "ok", "response": final_response, "session_id": session_id}

Step 7 uses host.send(), not host.ask(). Skill learning never adds latency to the response, it happens in the background while the user reads the answer.


The LLM Gateway: Hot-Swap and Circuit Breaker

LLMGatewayActor is the single point through which all LLM calls flow. It can switch providers at runtime without restarting, and it protects downstream actors from a flaky provider with a built-in circuit breaker.

# Switch from Ollama to Anthropic — takes effect immediately, no restart
curl -X POST http://localhost:8091/api/v1/actors/llm_gateway/switch_provider \
  -d '{"provider":"anthropic","model":"claude-opus-4-8"}'

# Or to OpenAI
curl -X POST http://localhost:8091/api/v1/actors/llm_gateway/switch_provider \
  -d '{"provider":"openai","model":"gpt-4o"}'

The circuit breaker lives in the actor’s durable state, it survives restarts:

@actor
class LLMGatewayActor:
    provider:              str  = state(default="ollama")
    model:                 str  = state(default="llama3.2")
    circuit_open:          bool = state(default=False)
    consecutive_failures:  int  = state(default=0)
    total_completions:     int  = state(default=0)

    @init_handler
    def on_init(self, config: dict) -> None:
        host.process_groups.join("svc:llm_gateway")
        host.send_after(30_000, "timer_tick", {"op": "timer_tick"})

    @handler("completion")
    def completion(self, messages: list = None, tools: list = None) -> dict:
        if self.circuit_open:
            # Fail fast — don't queue work behind a broken provider
            return {"status": "ok", "response": self._simulated_response(),
                    "circuit_open": True}
        try:
            result = self._call_provider(messages or [], tools or [])
            self.consecutive_failures = 0
            self.total_completions += 1
            host.incr_counter("llm_completions_total", 1)
            return {"status": "ok", "response": result}
        except Exception as e:
            self.consecutive_failures += 1
            if self.consecutive_failures >= 3:
                self.circuit_open = True
                host.warn(f"LLM circuit opened after {self.consecutive_failures} failures")
                host.incr_counter("llm_circuit_opens_total", 1)
            return {"error": str(e), "response": self._simulated_response()}

    @handler("timer_tick", "cast")
    def timer_tick(self) -> None:
        # Gradual recovery: one fault cleared per 30s tick
        # 3 faults ? 90s before circuit closes again — prevents flapping
        if self.circuit_open and self.consecutive_failures > 0:
            self.consecutive_failures -= 1
            if self.consecutive_failures == 0:
                self.circuit_open = False
                host.info("LLM circuit closed — provider available again")
        host.send_after(30_000, "timer_tick", {"op": "timer_tick"})

    @handler("switch_provider")
    def switch_provider(self, provider: str = "", model: str = "") -> dict:
        self.provider = provider
        self.model    = model
        # Switching resets the circuit — assume the new provider is healthy
        self.circuit_open         = False
        self.consecutive_failures = 0
        return {"status": "ok", "provider": provider, "model": model}

    def _call_provider(self, messages: list, tools: list) -> dict:
        if self.provider == "ollama":
            resp = host.http_fetch("ollama", "POST", "/api/chat",
                                   {"model": self.model, "messages": messages, "stream": False})
        elif self.provider == "anthropic":
            resp = host.http_fetch("anthropic", "POST", "/v1/messages",
                                   {"model": self.model, "messages": messages,
                                    "tools": tools, "max_tokens": 4096})
        elif self.provider == "openai":
            resp = host.http_fetch("openai", "POST", "/v1/chat/completions",
                                   {"model": self.model, "messages": messages, "tools": tools})
        return self._normalize(resp)

Every provider response normalizes to the same format before leaving the gateway:

{
  "content":    "42 × 17 = 714",
  "stop_reason": "end_turn",
  "tool_calls": [],
  "usage":      {"input_tokens": 112, "output_tokens": 18}
}

AgentActor never knows which provider answered. Switching providers is transparent to the rest of the system.

Design tradeoff. The circuit breaker in this POC uses a simple failure count threshold. A production implementation would add per-provider backoff, budget caps, and latency-based degradation.


Skill Learning: The Self-Improvement Loop

This is what separates MiniHermes from every standard agent loop. When the agent uses three or more tools in a single turn, it asynchronously extracts a reusable skill. The next time the user asks something similar, the agent injects that skill into the system prompt and skips the re-discovery phase entirely.

The Durable Extraction Workflow

SkillExtractionWorkflow uses the @workflow_actor behavior, which checkpoints state after each step. A node crash during step 2 of 3 resumes from step 2, not the beginning:

@workflow_actor
class SkillExtractionWorkflow:

    @run_handler
    def run(self, payload: dict = None) -> dict:
        user_intent  = payload.get("user_intent", "")
        tool_sequence = payload.get("tool_sequence", [])
        domain        = payload.get("domain", "general")
        llm_id        = payload.get("llm_id", "")

        # Three focused LLM passes — each optimizes for a different extraction goal.
        # Python runs them sequentially (shared LLM budget).
        # Go runs them in true parallel goroutines for lower latency.
        name_result      = self._analyse_name(llm_id, user_intent, tool_sequence)
        # ? workflow checkpoints here; crash-safe from this point

        procedure_result = self._analyse_procedure(llm_id, user_intent, tool_sequence)
        # ? checkpoint

        trigger_result   = self._analyse_triggers(llm_id, user_intent, domain)
        # ? checkpoint

        skill_id = f"skill-{host.now_ms()}"
        skill_store_id, _ = pg_first("svc:skill_store")
        if skill_store_id:
            ask(skill_store_id, "propose_skill", {
                "skill_id":        skill_id,
                "name":            name_result.get("name", "unnamed-skill"),
                "description":     name_result.get("description", ""),
                "procedure":       procedure_result.get("procedure", ""),
                "tags":            trigger_result.get("tags", []),
                "trigger_patterns": trigger_result.get("patterns", []),
            })
        return {"status": "ok", "skill_id": skill_id}

    @signal_handler("cancel")
    def cancel(self) -> None:
        # In-flight extraction can be cancelled without crashing the actor
        host.info("SkillExtraction cancelled")

    @query_handler("status")
    def query_status(self) -> dict:
        return {"task_id": self.task_id, "status": self.status, "progress": self.progress}

Three Storage Layers for Three Access Patterns

@handler("propose_skill")
def propose_skill(self, skill_id: str = "", name: str = "",
                  description: str = "", procedure: str = "",
                  tags: list = None, trigger_patterns: list = None) -> dict:

    # KV: metadata — fast exact-key lookup when the ID is known
    meta = {"skill_id": skill_id, "name": name, "description": description,
            "status": "active", "usage_count": 0,
            "created_at": host.now_ms(), "last_used_at": host.now_ms()}
    host.kv_put(f"skill_meta:{skill_id}", json.dumps(meta))

    # BlobStorage: full procedure text — potentially several paragraphs
    host.blob.upload(f"skill_procedure_{skill_id}", procedure.encode())

    # TupleSpace: keyword indexes — pattern scan at query time, no SQL needed
    for tag in (tags or []):
        host.ts.write(["skill_tag", tag, skill_id, name])
    for pattern in (trigger_patterns or []):
        host.ts.write(["skill_trigger", pattern, skill_id])

    host.incr_counter("skills_created_total", 1)
    return {"status": "ok", "skill_id": skill_id}

Why three layers? KV answers “give me skill X” in O(1). TupleSpace answers “which skills match this query?” without an index build step. BlobStorage keeps large procedure text out of both KV values and message payloads.

Skill Matching at Query Time

@handler("match_skills")
def match_skills(self, query: str = "") -> dict:
    query_words = set(query.lower().split())

    # Scan all trigger entries — None is a wildcard
    all_triggers = host.ts.read_all(["skill_trigger", None, None])

    matched_ids = set()
    for tpl in all_triggers:
        pattern = tpl[1].lower()
        if pattern in query_words or any(w in pattern for w in query_words):
            matched_ids.add(tpl[2])

    skills = []
    for skill_id in matched_ids:
        meta_json = host.kv_get(f"skill_meta:{skill_id}")
        if not meta_json:
            continue
        meta = json.loads(meta_json)
        if meta.get("status") != "active":
            continue
        # Load the full procedure only for matched, active skills
        meta["procedure"] = host.blob.download(f"skill_procedure_{skill_id}").decode()
        skills.append(meta)
        # Track usage for lifecycle decisions
        meta["usage_count"]    += 1
        meta["last_used_at"]   = host.now_ms()
        host.kv_put(f"skill_meta:{skill_id}", json.dumps(meta))

    host.incr_counter("skill_matches_total", len(skills))
    return {"status": "ok", "skills": skills}

Skills Age Out Automatically

Skills that go unused for 30 days transition to stale. After 90 more days they become archived. A daily send_after tick drives this, no external scheduler:

@handler("timer_tick", "cast")
def timer_tick(self) -> None:
    now             = host.now_ms()
    thirty_days_ms  = 30 * 24 * 60 * 60 * 1000
    ninety_days_ms  = 90 * 24 * 60 * 60 * 1000

    all_tags = host.ts.read_all(["skill_tag", None, None, None])
    seen     = set()
    for t in all_tags:
        skill_id = t[2]
        if skill_id in seen:
            continue
        seen.add(skill_id)
        meta_json = host.kv_get(f"skill_meta:{skill_id}")
        if not meta_json:
            continue
        meta = json.loads(meta_json)
        age  = now - meta.get("last_used_at", now)
        if meta["status"] == "active" and age > thirty_days_ms:
            meta["status"] = "stale"
            host.kv_put(f"skill_meta:{skill_id}", json.dumps(meta))
        elif meta["status"] == "stale" and age > ninety_days_ms:
            meta["status"] = "archived"
            host.kv_put(f"skill_meta:{skill_id}", json.dumps(meta))

    host.send_after(24 * 60 * 60 * 1000, "timer_tick", {"op": "timer_tick"})
active  --> (30 days unused) -->  stale  --> (90 more days) -->  archived

This prevents the skill store from accumulating noise from one-off tasks that will never recur.


Memory: Three Tiers, One Actor

MemoryActor manages three memory tiers with different durability and retrieval characteristics. The Hermes reference implementation stores facts in flat files; MiniHermes uses KV + TupleSpace + BlobStorage, with each tier mapped to a storage layer.

@actor
class MemoryActor:
    memory_count: int = state(default=0)

    @handler("store_memory")
    def store_memory(self, key: str = "", value: str = "",
                     scope: str = "global", tier: str = "reachable",
                     agent_id: str = "", session_id: str = "") -> dict:
        if not key:
            return {"error": "key required"}
        scoped_key = self._scoped_key(scope, agent_id, session_id, key)

        if tier == "deep":
            # BlobStorage: large, rarely needed, not scanned by default
            host.blob.upload(f"deep_memory_{scoped_key}", value.encode())
        else:
            # KV: durable point lookup
            host.kv_put(scoped_key, str(value))

        # TupleSpace index: queryable by scope and tier regardless of storage layer
        host.ts.write(["memory", scope, tier, key, str(value)[:64]])
        self.memory_count += 1
        return {"status": "ok", "key": key, "scope": scope, "tier": tier}

    @handler("recall_memory")
    def recall_memory(self, key: str = "", scope: str = "global",
                      agent_id: str = "", session_id: str = "") -> dict:
        scoped_key = self._scoped_key(scope, agent_id, session_id, key)
        value = host.kv_get(scoped_key)
        if not value:
            # Try deep tier
            try:
                value = host.blob.download(f"deep_memory_{scoped_key}").decode()
            except Exception:
                pass
        return {"status": "ok", "key": key, "value": value, "found": bool(value)}

    @handler("list_memories")
    def list_memories(self, scope: str = "global", tier: str = None) -> dict:
        pattern = ["memory", scope, tier or None, None, None]
        tuples  = host.ts.read_all(pattern)
        memories = [{"key": t[3], "value": t[4], "tier": t[2]}
                    for t in tuples if len(t) >= 5]
        return {"status": "ok", "memories": memories, "count": len(memories)}

    def _scoped_key(self, scope: str, agent_id: str, session_id: str, key: str) -> str:
        if scope == "agent"   and agent_id:   return f"mem:agent:{agent_id}:{key}"
        if scope == "session" and session_id: return f"mem:session:{session_id}:{key}"
        return f"mem:global:{key}"

The three scopes (global, agent, session) determine which facts survive which boundaries: session memories disappear with the session, agent memories persist across sessions, global memories are shared across all agents.


Distributed Cron: Recurring Tasks That Survive Node Failures

“Summarize my tasks every morning” is a natural request. Making it work reliably across a cluster requires solving three problems at once: who fires the job when there are three nodes, what happens if the firing node crashes mid-delivery, and how do you prevent duplicate execution? MiniHermes solves all three with two primitives:

// Go — CronSchedulerActor
func (a *CronSchedulerActor) tick() {
    // TryAcquire returns false immediately if another node holds the lock.
    // TTL of 90s exceeds the 60s tick interval, preventing leader gaps.
    acquired, _ := host.Lock().TryAcquire("minihermes", "cron_leader", 90000)
    if !acquired {
        return // another node leads this cycle — nothing to do
    }

    now := host.NowMs()
    for _, jobID := range a.JobIDs {
        job := a.loadJob(jobID)
        if now-job.LastRunAt >= job.IntervalMs {
            payload := map[string]interface{}{
                "job_id": job.JobID, "prompt": job.Prompt, "session_id": job.SessionID,
            }
            // Channel: at-least-once. If agent crashes before ack, job redelivers.
            host.Ch().Send("", "cron:pending", "cron_job", payload)
            job.LastRunAt = now
            a.saveJob(job)
        }
    }
}

The agent runs each cron job in an isolated session context so the job never bleeds into the user’s live conversation:

@handler("process_cron_job", "cast")
def process_cron_job(self, job_id: str = "", prompt: str = "",
                     session_id: str = "") -> None:
    cron_session = f"cron:{session_id}"

    # Stash the current interactive conversation
    saved_messages = self.messages[:]

    # Load the cron session's own history — completely separate from user sessions
    raw = host.kv_get(f"session_history:{cron_session}")
    self.messages = json.loads(raw) if raw else []

    self._run_agent_loop(prompt, tools=[])

    host.kv_put(f"session_history:{cron_session}", json.dumps(self.messages))
    self.messages = saved_messages  # restore user conversation

    host.send(audit_id, "log_event",
              {"event_type": "cron_executed", "detail": f"job_id={job_id}"})

Creating a recurring task takes one API call:

curl -X POST http://localhost:8091/api/v1/actors/cron_scheduler/create_job \
  -d '{
    "job_id":     "daily-digest",
    "prompt":     "Summarize today'\''s tasks and send a digest email",
    "schedule":   "every_24h",
    "session_id": "cron-digest"
  }'

Context Compression: Long Conversations Without Truncation

Every LLM agent eventually exceeds the model’s context window. The reference Hermes implementation truncates, it drops the oldest messages and loses context. MiniHermes compresses instead: ContextCompressorActor summarizes the middle of the conversation, keeps the recent tail intact, and archives the full original.

@handler("check_and_compress")
def check_and_compress(self, messages: list = None, token_budget: int = 4096) -> dict:
    messages        = messages or []
    estimated_tokens = sum(len(str(m)) // 4 for m in messages)

    if estimated_tokens < token_budget * 0.75:
        return {"compressed": False, "messages": messages}

    system_msgs  = [m for m in messages if m.get("role") == "system"]
    other_msgs   = [m for m in messages if m.get("role") != "system"]
    recent_count = max(4, len(other_msgs) // 3)
    middle       = other_msgs[:-recent_count]
    recent       = other_msgs[-recent_count:]

    if len(middle) < 2:
        return {"compressed": False, "messages": messages}

    # Archive the full original before compression — preserves audit trail
    if self.session_id:
        host.kv_put(f"full_history_archive:{self.session_id}", json.dumps(messages))

    llm_id, _ = pg_first("svc:llm_gateway")
    summary_resp = ask(llm_id, "completion", {
        "messages": [
            {"role": "system",
             "content": "Summarize this conversation history concisely. "
                        "Preserve key facts, tool results, and decisions."},
            {"role": "user", "content": json.dumps(middle)}
        ],
        "tools": []
    })

    summary_text = summary_resp.get("response", {}).get("content", "")
    summary_msg  = {"role": "assistant",
                    "content": f"[Conversation summary: {summary_text}]",
                    "is_summary": True}

    compressed = system_msgs + [summary_msg] + recent
    host.incr_counter("context_compressions_total", 1)
    return {"compressed": True, "messages": compressed,
            "original_count": len(messages), "compressed_count": len(compressed)}

Design tradeoff. LLM-based summarization costs tokens and adds latency to that one turn. The tradeoff is that the compressed context is semantically richer than simple truncation as the model retains the meaning of earlier turns, not just the most recent N messages. For a task-focused agent this matters: a calculation result from turn 3 is still relevant at turn 50.


Guardrails: Per-Tool Policy Enforcement Without Redeployment

GuardrailsGateActor implements a GenFSM that sits between every tool call and execution. Every call passes through it. Policies update at runtime via a single message — no redeploy, no restart.

@fsm_actor(states=["allow", "review", "approved", "denied"], initial="allow")
class GuardrailsGateActor:
    # tool_name ? "allow" | "deny" | "review"
    policies: dict = state(default_factory=dict)
    deny_count: int = state(default=0)

    @handler("check_tool")
    def check_tool(self, tool_name: str = "", input: dict = None) -> dict:
        policy = self.policies.get(tool_name, "allow")

        if policy == "deny":
            self.deny_count += 1
            host.incr_counter("tool_denials_total", 1)
            host.send(audit_id, "log_event",
                      {"event_type": "tool_denied", "detail": f"tool={tool_name}"})
            return {"decision": "deny", "reason": f"{tool_name} is blocked by policy"}

        if policy == "review":
            # FSM transitions to review — observable by operators via get_state
            self.fsm_state = "review"
            host.send(audit_id, "log_event",
                      {"event_type": "tool_review", "detail": f"tool={tool_name}"})
            # Production: pause here and await human approval via Channel
            self.fsm_state = "approved"
            return {"decision": "allow", "reviewed": True}

        return {"decision": "allow"}

    @handler("set_policy")
    def set_policy(self, tool_name: str = "", decision: str = "allow") -> dict:
        self.policies[tool_name] = decision
        host.send(audit_id, "log_event",
                  {"event_type": "policy_set",
                   "detail": f"tool={tool_name} decision={decision}"})
        return {"status": "ok", "tool_name": tool_name, "decision": decision}

    @handler("get_state")
    def get_state(self) -> dict:
        return {"fsm_state": self.fsm_state, "policies": self.policies,
                "deny_count": self.deny_count}
# Block a dangerous tool immediately — affects all in-flight and future calls
curl -X POST http://localhost:8091/api/v1/actors/guardrails/set_policy \
  -d '{"tool_name":"delete_file","decision":"deny"}'

# Route a sensitive tool through human review
curl -X POST http://localhost:8091/api/v1/actors/guardrails/set_policy \
  -d '{"tool_name":"send_email","decision":"review"}'

The GenFSM behavior validates every transition at runtime. Attempting allow --> approved without going through review first is rejected by the framework so that bugs in the policy logic cannot produce invalid states.


Tools: Runtime Registration and HTTPFetch Execution

Tools are not compiled in. Any HTTP endpoint can become a tool at runtime without redeploying the binary:

# Register a weather API as a tool — takes effect immediately
curl -X POST http://localhost:8091/api/v1/actors/tool_executor/register_tool \
  -d '{
    "name":        "weather",
    "description": "Get current weather for a city",
    "input_schema": {"type":"object","properties":{"city":{"type":"string"}}},
    "handler_type": "service_link",
    "handler_config": {"link_name":"openweather","path":"/data/2.5/weather","method":"GET"}
  }'

ToolExecutorActor dispatches registered tools via host.http_fetch() and the only way to make outbound network calls from within the WASM sandbox:

@actor
class ToolExecutorActor:
    tools: dict     = state(default_factory=dict)   # name ? spec
    exec_count: int = state(default=0)

    @init_handler
    def on_init(self, config: dict) -> None:
        self.tools = {t["name"]: t for t in _BUILTIN_TOOLS}
        host.process_groups.join("svc:tool_executor")

    @handler("register_tool")
    def register_tool(self, name: str = "", description: str = "",
                      input_schema: dict = None, handler_type: str = "builtin",
                      handler_config: dict = None) -> dict:
        self.tools[name] = {
            "name": name, "description": description,
            "input_schema": input_schema or {},
            "handler_type": handler_type,
            "handler_config": handler_config or {}
        }
        return {"status": "ok", "name": name}

    @handler("execute")
    def execute(self, name: str = "", input: dict = None) -> dict:
        input = input or {}
        if name not in self.tools:
            return {"error": f"unknown tool: {name}"}
        self.exec_count += 1
        host.incr_counter(f"tool_{name}_total", 1)

        spec = self.tools[name]
        if spec.get("handler_type") == "service_link":
            cfg  = spec.get("handler_config", {})
            resp = host.http_fetch(cfg["link_name"], cfg.get("method","GET"),
                                   cfg["path"], input)
            return {"result": resp}

        # Built-in handlers
        if name == "calculator":
            expr = input.get("expression", "0")
            try:
                result = eval(expr, {"__builtins__": {}})  # demo only — see gaps section
                return {"result": str(result)}
            except Exception as e:
                return {"error": str(e)}
        if name == "memory_store":
            mem_id, _ = pg_first("svc:memory")
            if mem_id:
                return ask(mem_id, "store_memory", input) or {}
        if name == "memory_recall":
            mem_id, _ = pg_first("svc:memory")
            if mem_id:
                return ask(mem_id, "recall_memory", input) or {}

        return {"result": f"[simulated] {name} executed"}

Service Discovery: Process Groups vs. Object Registry

MiniHermes demonstrates both discovery patterns side by side.

Process Groups — simple, built-in, zero configuration:

# Every actor announces itself on startup
host.process_groups.join("svc:agent")

# Callers find the first available member — location-transparent
agent_id, err = pg_first("svc:agent")
result = ask(agent_id, "chat", {"message": "Hello"})
// Go version — same pattern
agentID, err := host.PG().First("svc:agent")

Object Registry — richer, capability-aware, preferred for production:

# On startup — declare what this actor can do
host.registry.register(ctx="", object_type="actor",
                        object_id=self.actor_id,
                        object_category="skill_store",
                        capabilities=["match_skills", "propose_skill", "lifecycle"])

# Caller — find an actor that specifically supports skill matching
actors = host.registry.discover(ctx="", object_type="actor",
                                 object_category="skill_store",
                                 required_capability="match_skills")
skill_id = actors[0]["object_id"] if actors else None
// Go — capability-aware lookup
agentID, err := registryFirst("agent", "svc:agent", "tool_use")

Process groups answer “is there anyone in this group?” Registry answers “is there anyone in this group who can do this?” The registry is the better choice when multiple actor versions may be deployed simultaneously, or when different instances offer different capabilities.


Audit Trail and Health Monitoring

Non-Blocking Audit with GenEvent

AuditEventActor uses the GenEvent behavior. Senders call host.send() with fire-and-forget so audit logging never adds latency to the critical path:

@event_actor
class AuditEventActor:
    event_count: int = state(default=0)

    @init_handler
    def on_init(self, config: dict) -> None:
        host.process_groups.join("svc:audit")

    @handler("log_event", "cast")  # "cast" = fire-and-forget, no reply
    def log_event(self, event_type: str = "", detail: str = "",
                  timestamp: int = 0) -> None:
        ts = timestamp or host.now_ms()
        host.ts.write(["audit", event_type, ts, detail])
        self.event_count += 1

    @handler("query_events")
    def query_events(self, event_type: str = None) -> dict:
        pattern = ["audit", event_type or None, None, None]
        events  = host.ts.read_all(pattern)
        return {"status": "ok", "events": events, "count": len(events)}

The TupleSpace audit log is append-only by construction, there is no ts.delete() in the sandbox. Every tool call, policy change, skill creation, cron execution, and circuit event lands here and stays queryable by event type.

Health Monitor with SendAfter Polling

HealthMonitorActor never subscribes to membership change events. It polls every service group on a fixed interval and writes a snapshot to TupleSpace:

_SERVICE_GROUPS = [
    "svc:llm_gateway", "svc:tool_executor", "svc:agent",
    "svc:skill_store", "svc:guardrails", "svc:audit",
    "svc:cron_scheduler", "svc:session_manager", "svc:memory",
    "svc:context_compressor", "svc:health_monitor",
]

@actor
class HealthMonitorActor:
    poll_count:      int  = state(default=0)
    last_poll_ms:    int  = state(default=0)
    group_health:    dict = state(default_factory=dict)
    poll_interval_ms: int = state(default=5000)

    @init_handler
    def on_init(self, config: dict) -> None:
        host.process_groups.join("svc:health_monitor")
        host.send_after(self.poll_interval_ms, "poll_tick", {"op": "poll_tick"})

    @handler("poll_tick", "cast")
    def poll_tick(self) -> None:
        health = {}
        for grp in _SERVICE_GROUPS:
            try:
                members      = host.process_groups.members(grp)
                health[grp]  = len(members)
            except Exception:
                health[grp] = 0

        self.group_health  = health
        self.poll_count   += 1
        self.last_poll_ms  = host.now_ms()

        host.ts.write(["health_snapshot", self.last_poll_ms, json.dumps(health)])
        # Each tick reschedules the next — no external scheduler
        host.send_after(self.poll_interval_ms, "poll_tick", {"op": "poll_tick"})

    @handler("get_health")
    def get_health(self) -> dict:
        degraded = [g for g, c in self.group_health.items() if c == 0]
        return {
            "status":       "ok" if not degraded else "degraded",
            "group_health": self.group_health,
            "healthy":      len(self.group_health) - len(degraded),
            "degraded":     degraded,
        }

Polling converges to the true state on every tick regardless of event ordering, it’s always eventually consistent and never stale for more than one poll interval.


Primitives Scorecard

MiniHermes uses 16 distinct PlexSpaces primitives across 12 actors:

PrimitiveWhere UsedWhat It Enables
KV.Get/PutAll actorsSession history, skill metadata, cron jobs, provider config
TupleSpace.Write/ReadAllSkills, Memory, Audit, HealthTag index, memory tiers, audit log, health snapshots
BlobStorage.Upload/DownloadSkills, MemorySkill procedures, deep memory archives
Channel.Send/Receive/AckCronAt-least-once job delivery; redelivers on crash
DistributedLock.TryAcquireCronSingle scheduler leader per cluster
ProcessGroups.Join/FirstAll actorsLocation-transparent svc:* discovery
ObjectRegistry.Register/DiscoverAgent, Skills, Session, HealthCapability-aware routing
SendAfterLLM, Cron, Health, SkillsSelf-scheduling tick loops; replaces external cron
HTTPFetchLLM, ToolsOutbound calls to Ollama, OpenAI, Anthropic, tool APIs
AskAgent, Tools, CompressorRequest-reply across actor boundaries
SendAgent, Cron, AuditFire-and-forget: audit events, async skill learning
IncrCounterAll actorsMetrics on every key operation
Workflow (run/signal/query)SkillWorkflowDurable parallel skill extraction with cancel/query
Durability (checkpoint_interval)All stateful actorsState persistence across crashes and restarts
GenFSMGuardrailsValidated state machine; invalid transitions rejected
GenEventAuditNon-blocking event delivery; callers never wait

Known Gaps

MiniHermes is a proof of concept, not a production system. The same disclaimer applies here as in the MiniClaw post: the point is to demonstrate what the architecture can support, not to ship something you should run in production today.

  • Skill quality and safety. The extraction workflow uses LLM reflection without any validation layer. Extracted skills can be incorrect, subtly wrong, or even harmful if the original task involved a bad assumption. A production system needs automated skill evaluation, human review for high-impact skills, and version history with rollback.
  • Calculator eval. The built-in calculator tool uses Python’s eval() with empty builtins. This is a demo shortcut. In production, replace it with an AST-based evaluator or a sandboxed tool actor in its own WASM module with no outbound capabilities at all.
  • Skill matching at scale. TupleSpace keyword matching works well up to thousands of skills. For a large skill store, keyword overlap produces too many false positives. The fix is an embedding-based vector index for semantic similarity but that requires an embedding model and an external vector store.
  • Context compression quality. The compressor summarizes the middle of the conversation with a generic prompt. It does not distinguish between a casual exchange and a chain of tool results that the later part of the conversation depends on. Poor summarization can cause the agent to “forget” a result it needs. Production compression needs to identify load-bearing context and exclude it from summarization.
  • No per-session actor instances. AgentActor stores self.messages as actor state, which all chat calls within one actor share. This is safe when there is one actor per session, but the POC maps many sessions to one actor instance. A production deployment should either run one actor per session or explicitly key all state by session_id.
  • No prompt injection defense. Tool results flow back into the conversation without any sanitization. A malicious tool response could attempt to override the system prompt. Production systems need input/output validation and possibly an LLM-as-judge layer between tool results and the next LLM call.
  • Circuit breaker threshold is fixed. Three consecutive failures opens the circuit. A slow provider that times out 20% of the time would never trip the breaker. Production needs adaptive thresholds based on error rate windows, not just consecutive failure counts.
  • No credential management. The LLM gateway reads provider API keys from service link configuration, which in this POC are stored in app-config.toml. A production system needs the phantom-token pattern from MiniClaw: the gateway resolves a real key from actor-private KV and never echoes it in any response or log.

MiniHermes vs. MiniClaw: Complementary, Not Competing

DimensionMiniClawMiniHermes
Primary focusSecurity and multi-tenant isolationSelf-improvement and operational resilience
Agent topologyMulti-agent orchestration with sub-tasksSingle self-improving long-lived agent
Session modelEphemeral per-requestLong-lived with LLM-based compression
Skill learningNone — static tool catalogAutomatic from conversation, durable workflow
SchedulingNoneDistributed cron with DistLock + Channel
LLM integrationSimulated onlyReal Ollama + OpenAI + Anthropic, hot-swap
Provider managementNoneHot-swap + gradual circuit breaker
Memory tiersSingle KV scopeCore / Reachable / Deep across three storage layers
GuardrailsWASM + actor isolation (structural)GenFSM gate with per-tool runtime policies
Credential handlingPhantom token in actor-private KVService link config (see gaps)
ObservabilityTupleSpace audit, health pollingSame, plus IncrCounter metrics on every operation

MiniClaw establishes the security foundation with WASM isolation, tenant enforcement, credential proxying, blast-radius containment. MiniHermes builds on that same foundation to add learning, resilience, and operational flexibility. A production system would combine both.


Building and Running

Prerequisites

Go implementation:

brew tap tinygo-org/tools && brew install tinygo
cargo install wasm-tools
npm install -g @bytecodealliance/jco

Python implementation:

pip install -e path/to/sdks/python

Ollama (optional — falls back to simulated LLM):

brew install ollama
ollama run llama3.2   # pulls ~2GB on first run

All tests pass without any LLM running. When Ollama is available, LLMGatewayActor switches automatically from the simulated fallback to real inference.

Build and Test

# Python
cd examples/python/apps/minihermes
./build.sh                       # componentize-py ? WASM Component Model binary
pytest test_minihermes.py -v     # unit tests, no live node required

# Go
cd examples/go/apps/minihermes
./build.sh                       # TinyGo ? wasm-tools ? component binary
go test ./... -v                 # unit tests, no live node required

Integration Tests Against a Live Node

# Start a PlexSpaces node first — see docs/getting-started.md
cd examples/go/apps/minihermes
./test.sh 8091                   # 21 steps, roughly 2 minutes

The test script covers the full actor tree:

# Basic agent chat
ask "agent" '{"op":"chat","message":"Hello","session_id":"test-1"}'

# Tool use — triggers guardrail check before execution
ask "agent" '{"op":"chat","message":"Calculate 42 * 17","session_id":"test-1"}'

# Hot-swap LLM provider
ask "llm_gateway" '{"op":"switch_provider","provider":"anthropic","model":"claude-opus-4-8"}'

# Register a new tool at runtime
ask "tool_executor" '{
  "op":"register_tool","name":"weather",
  "description":"Get weather for a city",
  "input_schema":{"type":"object","properties":{"city":{"type":"string"}}},
  "handler_type":"service_link",
  "handler_config":{"link_name":"openweather","path":"/data/2.5/weather","method":"GET"}
}'

# Create a cron job
ask "cron_scheduler" '{
  "op":"create_job","job_id":"morning-digest",
  "prompt":"Summarize pending tasks","schedule":"every_24h","session_id":"cron-main"
}'

# Block a tool via guardrails
ask "guardrails" '{"op":"set_policy","tool_name":"delete_file","decision":"deny"}'

# Query health across all service groups
ask "health_monitor" '{"op":"get_health"}'

# Query audit trail for tool executions
ask "audit_event" '{"op":"query_events","event_type":"tool_executed"}'

Future Enhancements

These patterns extend naturally once the actor foundation is in place:

  • Vector memory. Replace TupleSpace keyword matching in match_skills with embedding-based similarity search. The interface stays the same and SkillStoreActor still answers match_skills messages.
  • Multi-agent skill sharing. When one agent extracts a skill, broadcast the skill ID via TupleSpace to all other agent instances. Each agent loads the skill on the next match. The fleet improves together.
  • Streaming responses. Replace ask() for LLM completions with chunked Channel delivery. The agent sends each token back to the user as it arrives instead of buffering the entire response.
  • Skill versioning. Store each procedure update as a new BlobStorage object with a version suffix. SkillStoreActor tracks the current version in KV and can roll back if a skill causes regressions.
  • TypeScript implementation. The same 12-actor pattern compiled to TypeScript WASM that are useful for teams already working in the Node ecosystem.

Conclusion

MiniHermes is a proof of concept, not a production agent platform. What it demonstrates is a way of thinking about agent systems that is different from the standard monolith approach. The Hermes Agent design from Nous Research gives us three powerful ideas: prompt discipline, multi-step tool loops, and skill accumulation. Those ideas work whether the agent runs in one Python process or across 12 actors. What changes is everything else, e.g., what happens when a component crashes, how you update a policy without restarting, how you prevent one tenant’s data from touching another’s, and how you keep conversations going past the model’s context limit.

The actor model with PlexSpaces provides a set of primitives like KV, TupleSpace, BlobStorage, Channel, DistributedLock, SendAfter, GenFSM, GenEvent, Workflow that map directly onto the operational problems an agent system faces. State durability, fault isolation, leader election, non-blocking audit, validated state machines, durable workflows: each is one primitive. The full source for both Python and Go implementations lives at github.com/bhatti/PlexSpaces. The architecture is meant to be a starting point, not a finished product.


References

No Comments

No comments yet.

RSS feed for comments on this post. TrackBack URL

Sorry, the comment form is closed at this time.

Powered by WordPress