mumo← Back

MCP Server — mumo

Contents

mumo MCP Server

Run structured multi-model deliberations directly from MCP-compatible clients such as Claude Code, Cursor, VS Code, Codex, Windsurf, Claude Desktop, and others.

The MCP server uses the same auth and session artifacts as the REST API, but it is an opinionated agent-moderated surface: your agent starts a panel round, waits for the result, reads the responses and claim map, then decides whether to append another round. Use MCP if you want deliberation as tool calls; use the REST API if you're integrating from your own backend.

New to mumo MCP? Start with the generic install guide. It walks you through adding the server config, generating your API key, and verifying the tools. This page goes deeper on client-specific examples, tool behavior, and response fields.

Setup#

Claude Code#

The mumo plugin ships through Anthropic's official marketplace and bundles the MCP server config plus an auto-triggering skill. Your API key is stored in your system keychain (no env-var export required). Full walkthrough at mumo.chat/install/claude-code.

Codex#

The mumo Codex plugin bundles the MCP server config and an auto-triggering skill. Register the marketplace, then install from the Codex desktop app (Plugins → publisher dropdown → mumo → +Install mumo). Auth via MUMO_API_KEY env var (launchctl setenv on macOS for GUI launches). Full walkthrough at mumo.chat/install/codex.

Cursor#

The mumo Cursor plugin bundles the MCP server, an auto-triggering skill, and a Cursor rule. Auth via MUMO_API_KEY env var (launchctl setenv on macOS for GUI launches). Full walkthrough at mumo.chat/install/cursor.

Invocation. Cursor's rule system treats plugin rules and skills as soft priors — auto-trigger on contested decisions is best-effort. For reliable routing, name mumo explicitly in your prompt: "ask mumo about…", "run this by a mumo panel", "get me a second opinion from mumo."

Hermes Agent#

The mumo Hermes skill clones into ~/.hermes/skills/<category>/mumo/ and ships with the canonical SKILL.md, four cognitive-shape playbooks, and reference docs. Merge the bundled config/mumo.yaml into ~/.hermes/config.yaml under mcp_servers, restart Hermes (the /reload-mcp slash command is unreliable across versions), and the seven mumo tools surface as mcp_mumo_*. Full walkthrough at mumo.chat/install/hermes.

OpenClaw#

The mumo OpenClaw skill installs via ClawHub (openclaw skills install mumo — published listing at clawhub.ai/ericatmumo/mumo) or by cloning the source repo into ~/.openclaw/skills/mumo/. Either path lands the canonical SKILL.md, four cognitive-shape playbooks, and reference docs. Register the MCP server with openclaw mcp set mumo '<json>' (Streamable HTTP, literal Bearer in headers — no env-var pointer), restart OpenClaw, and the seven mumo tools surface as mumo__create_deliberation, mumo__wait_for_round, etc. (double-underscore prefix). Full walkthrough at mumo.chat/install/openclaw.

VS Code (GitHub Copilot)#

The mumo extension installs from the Visual Studio Marketplace and stores your key in SecretStorage (macOS Keychain / Windows Credential Manager / Linux libsecret). A Get Started walkthrough opens on first install. Full walkthrough at mumo.chat/install/vs-code.

Invocation. VS Code's Copilot doesn't have a SKILL.md discovery mechanism the way Claude Code and Cursor do, so the auto-triggering skill from those plugins doesn't apply. Invoke mumo explicitly in Agent chat: "ask mumo about…", "run this by a mumo panel", "get me a second opinion from mumo."

Other clients (Windsurf, Claude Desktop, Cline, Zed, …)#

Any MCP-compatible client that supports HTTP transport with a custom Authorization header will work.

  1. Add a remote MCP server named mumo in your client.
  2. Point it at https://mumo.chat/api/mcp with an Authorization header placeholder.
  3. Generate a platform API key at mumo.chat/settings/api-keys. Keys begin with mmo_live_.
  4. Replace the placeholder with your real key, reload your client, and verify the mumo tools appear.
  5. Ask for your first deliberation.

Most clients accept a generic mcp.json shape:

{
  "mcpServers": {
    "mumo": {
      "url": "https://mumo.chat/api/mcp",
      "headers": {
        "Authorization": "Bearer mmo_live_YOUR_KEY_HERE"
      }
    }
  }
}

Some clients use different keys (servers instead of mcpServers), TOML instead of JSON, or nested header settings — check your client's docs. For Claude Desktop, this file lives at ~/Library/Application Support/Claude/claude_desktop_config.json on macOS.


Your first call#

The simplest way to verify everything is wired up is to fire off a round. From your agent:

Create a deliberation: "Should we use Postgres or MongoDB for an event store?"

The agent invokes create_deliberation. The call returns in under a second with a session id, round id, display-friendly round number, and progress URL — but the models are still running. Ask the agent to call wait_for_round with the returned session_id and round_id (typically 15–120s depending on model choice). Then:

Session id abc-123, status ready.
3 models responded. claim_map.claims[] has 4 entries with cross-model positions.

The loop: tool call (fast ack) → wait_for_round → responses + claim map → agent reads → append or stop.


Tools#

Seven tools. MCP keeps canonical API field names where the same concept crosses the boundary, but omits REST-only controls that do not belong in agent-moderated sessions.

ToolPurpose
create_deliberationStart a new deliberation.
append_roundAdd a follow-up round with steering snippets.
wait_for_roundWait for a round to finish, then return the full session.
get_sessionFetch full session state.
list_sessionsList your sessions, optionally filtered.
list_modelsSee models available to your tier.
get_creditCheck current credit wallet balance.

create_deliberation#

Start a new agent-moderated session. We'll return an ack immediately and run the models in the background. Call wait_for_round to read the round's artifacts once it reports complete. You'll receive raw responses + a cross-model claim map. If one round gives you everything you need, you're done. If you want more, call append_round with the existing session_id. Participating models will automatically get full context and you'll get the benefit of input caching when available on subsequent rounds.

Inputs:

  • prompt (required) — the question or topic.
  • reference — optional doc, spec, or design. Injected as shared context for all models.
  • models — array of 2–3 model IDs. Defaults to platform selection. Call list_models to enumerate.
  • moderator_name — display name for the steering identity (e.g., your agent's name). Surfaces in the published transcript. Also important for optimizing participating model selection when you don't specify models.
  • application — display name of the client (e.g., "Claude Code"). Surfaces in the session info panel.
  • improvement_consent — optional per-session platform-improvement consent request. Paid users may pass true or false; omitted uses the account default. Free-tier sessions are consent-inclusive under accepted terms, so false is coerced to include and the ack/session text includes a disclosure.
  • recap_round — optional boolean (default false). When true, generates a structured per-round summary (round_recap) when round 0 completes. See Recap artifacts for the shape and pricing. Note: recap_session is intentionally NOT accepted on create_deliberation — synthesis requires ≥ 2 rounds, so it would degenerate to a round-recap-only behavior here. Set recap_session=true on a later append_round call when you want session-level synthesis.

Returns: an ack — session_id, round_id, round_index (0-based API value; display as round round_index + 1), progress_url, and polling metadata:

{
  "session_id": "abc-123",
  "round_id": "round-456",
  "round_index": 0,
  "status": "processing",
  "progress_url": "/api/sessions/abc-123/progress",
  "poll_after_ms": 5000,
  "progress_version": 0
}

Actual debits settle as models complete, reflected in the round's debits[] on get_session. Use get_credit when you need a fresh wallet read. See Credit wallet.

Models run in the background; read the session (rounds, claim maps) via wait_for_round once the round is complete.

append_round#

Add a follow-up round. Use after reading wait_for_round or get_session to steer the next round.

Inputs:

  • session_id (required) — session ID from a prior create_deliberation.
  • prompt (required) — the steering prompt for this round.
  • snippets — array of typed cross-model forwards. Each:
    • typeKEEP | EXPLORE | CHALLENGE | CORE | SHIFT
    • quote — verbatim quote from a prior round's response
    • quoted_model — model ID that said it
    • comment — optional commentary explaining why you're forwarding it
  • moderator_name — supply only when the steering identity changes mid-session.
  • recap_round — optional boolean (default false). When true, generates a round_recap for this round.
  • recap_session — optional boolean (default false). When true, generates session_synthesis over the in-flight round-recap set when this round completes. Cascade behavior: triggers round_recap generation for any prior rounds that don't already have one. The trigger round's recap is included automatically; you don't also need to set recap_round=true. See Recap artifacts for the cascade semantics and pricing.

Snippets are the highest-signal way to direct attention. Models see them as curated forwards from the moderator, with the snippet type shaping how they respond.

Returns: same ack shape as create_deliberation. Call wait_for_round with the returned session_id and round_id for the new round's artifacts and its debits[].

Errors:

  • The session must be in status: "ready". If it's streaming or processing, wait. If failed, you can't continue.
  • Round-append duplication corrupts deliberation history — the MCP server passes a derived idempotency key automatically based on the call args.

wait_for_round#

Wait for a round to finish, then return the full session. This is the intended follow-up to create_deliberation and append_round.

Under the hood, the tool polls the lightweight progress endpoint and fetches the full session exactly once after the target round reaches terminal state. Use this instead of repeatedly calling get_session while models are still running.

Inputs:

  • session_id (required) — session ID from create_deliberation or append_round.
  • round_id — preferred target round selector from the ack.
  • round_index — optional 0-based fallback if you do not have round_id.
  • timeout_seconds — max wait before returning a timeout response (default 40; max 120). On timeout, the response carries recommended_client_action: "poll_again" — re-call to continue waiting.
  • poll_after_ms — initial polling interval (default 1000; backs off up to 5000).

On timeout, the round keeps running. Call wait_for_round again, or call get_session if you specifically need a partial read.

Structured output. wait_for_round returns both human-readable text (content[0].text — full session markdown on terminal, short prose summary on timeout) and protocol-native structured data (structuredContent) that lets agents branch on round state without parsing prose. Clients that don't support structuredContent still see the existing text channel unchanged.

{
  "session_id": "ses_...",
  "round_id": "rnd_...",
  "round_index": 1,
  "round_status": "partial_failure",
  "is_terminal": true,
  "is_usable": true,
  "completed_models": [
    { "model": "claude-opus-4-7", "tokens_in": 4120, "tokens_out": 891, "finish_reason": "end_turn", "is_partial": false }
  ],
  "in_progress_models": [],
  "failed_models": [
    { "model": "qwen-3.6-plus", "error": "stream ended without done event", "message": "...", "error_code": "stream_ended_without_final_marker", "partial_text": "...", "partial_text_length": 1804 }
  ],
  "recommended_client_action": "proceed_with_partial_result"
}

Handling partial rounds

wait_for_round always returns HTTP 200 with explicit semantic state. Branch on round_status and recommended_client_action rather than parsing prose or pattern-matching on error strings.

round_status values:

  • complete — every target model produced a final response. Call get_session for full content.
  • partial_failure — at least one model produced a final response, at least one failed. is_usable: true. Read successful answers from completed_models[]; read attribution from failed_models[].
  • in_progress — round is still running. Call wait_for_round again with the same arguments; recommended_client_action: "poll_again".
  • failed — every target model failed and the round produced no usable output. Inspect recommended_client_action: "retry" when at least one failure's error_code is in the transient set (rate_limit, provider_error, internal_deadline_reached, deadline_expired, stream_interrupted, stream_ended_without_final_marker), "abandon" otherwise. Codes that look like failures but aren't safely retryable — pre_stream_provider_error (covers auth/malformed 4xx as well as 5xx), provider_auth_failure, pre_stream_failure, max_retries_exceeded — map to abandon. Each entry in failed_models[] carries a canonical error_code (or null on legacy rows) alongside the free-text error / message; switch on error_code rather than parsing prose.

recommended_client_action contract:

ActionMeaning
proceed_with_complete_resultRound is complete; read responses from the full session.
proceed_with_partial_resultAt least one model succeeded; partial result is usable.
poll_againRound still in flight; call wait_for_round again.
retryRound failed but the failure looks transient; safe to call append_round (or a fresh create_deliberation) again.
abandonRound failed non-transiently (auth, malformed input, etc.); retrying is unlikely to help.

The action is server-derived guidance, not authority. Sophisticated agents can apply their own logic over the raw fields (round_status, is_terminal, is_usable, the per-model arrays).

Polling guidance. wait_for_round blocks server-side with internal polling. Default timeout is 40s, max 120s — the upper bound matches the empirical MCP transport ceiling so the schema enforces the safe range; the 40s default leaves headroom against shorter host-side request ceilings observed in some clients. For rounds longer than the timeout, the response carries recommended_client_action: "poll_again" and the agent re-calls. The /api/sessions/{id}/progress endpoint supports If-None-Match/ETag for cheap no-change polls if you'd rather poll directly.

Partial-text handling. Partial text appears in two distinct places depending on outcome:

  • Successful-but-truncated responses stay in completed_models[] (and rounds[].responses[] on get_session) with is_partial: true and a populated finish_reason. The model produced output and the call reached done, but the provider signaled truncation. Common cause: finish_reason: "max_tokens" (or "length", "MAX_TOKENS" per provider vocabulary) — the output cap was hit. Treat as a partial answer; consider asking the user whether to extend.
  • Error rows that managed to produce text appear in failed_models[] with partial_text and partial_text_length populated, NOT in completed_models[]/responses[]. Those collections are reserved for successful answers. Common causes: provider_error (post-first-byte SDK failure), stream_ended_without_final_marker (SDK closed cleanly without a done event despite emitting bytes), internal_deadline_reached (the 150s in-band deadline fired while the model was still producing output).

In both surfaces, the text is "what the model managed to produce before it stopped" — useful for diagnostic recovery, sometimes usable as a partial answer.

get_session#

Fetch the full state of a session — all rounds, responses, snippets, claim maps, and metadata. Use this to read or re-read completed state; while a round is running, prefer wait_for_round.

The most useful field for downstream decisions: rounds[].claim_map.claims[]. Each claim has:

  • quote — the verbatim claim
  • originator — model that said it
  • positions[] — the cross-model reactions, each with model, type (KEEP/CHALLENGE/etc), and comment
  • reaction_count — how many models reacted

This is the highest-signal view of where the panel agrees and where they're stuck.

After a round completes, the session response carries settled cost rollups so agents can self-evaluate "was this worth it?":

  • total_cost_usd — top-level ground-truth cost for the active session, sourced from the v_llm_spend ledger. Covers every billable bucket (deliberation + moderator + recap (round_recap + session_synthesis) + snippet extraction + editorial + search) — the same totals the admin dashboard reports. Markup-exclusive (distinct from wallet debits, which are markup-included).
  • rounds[].cost_usd — same source, attributed per completed round. Useful for distinguishing rounds that delivered value from rounds that were convergence noise. During an in-flight round this may be 0 or incomplete; wait_for_round returns it after the target round is terminal.

Each round also carries a debits[] array — one entry per model call that settled to the wallet:

{
  "debits": [
    { "transaction_id": "txn_01h9x2p7k...", "model": "claude-opus-4-6", "amount_usd": 0.11, "settled_at": "2026-04-24T21:15:32Z" }
  ]
}

amount_usd is markup-included (what the user paid). Sum of debits[].amount_usd across all newly completed rounds reconciles exactly with the wallet delta visible through get_credit.

list_sessions#

List the caller's sessions, optionally filtered.

Inputs:

  • statusready | streaming
  • limit — 1–200 (default 7)
  • offset — pagination

Returns a lightweight list (no response bodies). Useful for agents managing concurrent sessions — status: "ready" finds sessions awaiting your next round.

list_models#

Returns model id, provider, display name, context window, max output tokens, and pricing. Per-model available + unavailable_reason reflects both tier entitlement and credit-gate state — a model with available: false and unavailable_reason: "credit_exhausted" means your wallet balance is below that model's pricing.minimum_usd. Call before create_deliberation if the user wants specific models.

get_credit#

Fetch the caller's full wallet resource. Returns:

{
  "effective_balance_usd": 1.42,
  "buckets": {
    "free": {
      "balance_usd": 1.42,
      "monthly_grant_usd": 1.50,
      "resets_at": "2026-05-01T00:00:00Z"
    },
    "subscription": {
      "balance_usd": 0,
      "rollover_cap_usd": 30.00,
      "subscription_status": null
    },
    "refill": {
      "balance_usd": 0,
      "auto_refill_enabled": false
    }
  },
  "per_model_minimum_usd_default": 0.05,
  "debit_order": ["free", "subscription", "refill"]
}

Field semantics:

  • effective_balance_usd — sum across buckets; markup-included.
  • buckets.free — monthly grant + next reset boundary.
  • buckets.subscription — paid-tier balance + rollover cap + status ("active" | "past_due" | "cancelled" | "expired" | null).
  • buckets.refill — auto-refill top-up balance + enabled flag. When auto_refill_enabled: true, two additional fields appear: auto_refill_threshold_usd (trigger level) and auto_refill_amount_usd (top-up size).
  • debit_order — FIFO bucket drain sequence.

Use get_credit for wallet state (dashboards, post-refund checks, pre-session reads outside the write-op flow). Write operations still perform their own affordability preflight and return credit_exhausted if the wallet cannot cover the requested models.


Credit wallet#

Every billable LLM call debits a dollar-denominated wallet. A call that would put the caller below the per-model minimum is rejected pre-flight with credit_exhausted — see Errors.

All USD amounts in MCP responses are markup-included (what the user actually paid). Raw provider cost is a platform-internal dimension and is not part of the consumer contract.

Three places wallet state surfaces:

  • get_credit — standalone wallet read with effective balance, bucket breakdown, and per-model minimum.
  • get_session round debits[] — per-model transaction IDs + billed amounts, one entry per settled model call. Balance deltas reconcile to Σ debits[].amount_usd.

Balance is not embedded on get_session top-level — that call is high-volume during polling, and wallet state on a read-heavy path creates cache-coherency and semantic-coupling problems. Use get_credit when you need a fresh balance outside the write-op flow.


Snippet types#

The five buckets — KEEP / EXPLORE / CHALLENGE / CORE / SHIFT — are the steering primitive. Each carries a different framing into the next round's prompt:

BucketFraming
KEEP"This resonates with me."
EXPLORE"Let's go deeper on this."
CHALLENGE"I'm not sold on this."
CORE"This is what it comes down to."
SHIFT"This shifted my perspective."

The model receiving a snippet sees the framing — it's not a neutral forward. Use them deliberately. Concrete example:

Round 1: GPT proposes per-seat pricing for the enterprise tier. Claude proposes usage-based. The agent reads round 1, decides per-seat is the weaker option for early pilots. Round 2 append: prompt "Resolve the pricing model. Pick one." plus two snippets: • CHALLENGE on GPT's "per-seat assumes teams of >10," with comment "most pilots start at 3–5" • KEEP on Claude's "usage-based aligns incentives" Models read those framings and converge on usage-based with a per-seat fallback for >25.


Recap artifacts#

Two optional booleans opt rounds in to recap generation. The flags are accepted asymmetrically across the two tools — a session synthesis only carries information beyond a round recap when there are ≥ 2 rounds to synthesize over. On a single-round session, "session synthesis" and "round recap" produce the same artifact in two coats of paint. So:

  • recap_round (default false) — accepted on create_deliberation AND append_round. Generates a per-round summary (round_recap) when the round completes. Structured shape with title, tldr, agenda, and sections. Surfaces on get_session once written.
  • recap_session (default false) — accepted on append_round ONLY (rejected on create_deliberation because round 0 alone can't produce a meaningful synthesis). Generates the session-level synthesis (session_synthesis: title, tldr, origin, arcs) over the in-flight round-recap set when this round completes. Triggers a cascade that backfills round_recap for any prior rounds that don't already have one. Setting recap_session=true implicitly covers recap_round for the trigger round.

Pricing. Recap and synthesis bill via the standard credit wallet but with 0 bps markup — at-cost passthrough. A typical 3-round cascade lands around ~$0.04 in Kimi inference cost. Setting recap_session=true without realizing the cascade can be surprising; the docs flag this so agents that flip the bit are deliberate about it.

Reading the artifacts. On get_session / wait_for_round:

  • rounds[].round_recap — populated for any round whose recap generation has completed.
  • session_synthesis — populated once the cascade has produced a session-level synthesis. Absent until then.

Legacy distill. Pre-cutover sessions may carry a distill field on rounds; new sessions do not. Use recap_round / recap_session instead — legacy distill is disabled platform-wide.


Confidence scores#

When models emit self-reported confidence tags in their prose, those scores surface on responses:

  • rounds[].responses[].claim_confidence — per-claim scores
  • rounds[].responses[].snippets[].comment_confidence — per-snippet-comment scores
  • confidence_disclaimer — short advisory string

These are self-reported and not calibrated across models. Surface the disclaimer if you display them to users.


Identity metadata#

Both create_deliberation and append_round accept two optional identity fields:

  • moderator_name — display name of who/what is steering the deliberation. Shown in the session info panel and replaces "You" attribution in the transcript. Also important for optimizing participating model selection when you don't specify models. On append_round, supply only when the steering entity changes (e.g., a human takes over from an agent) — otherwise the existing value is preserved.
  • application — display name of the client driving the session (e.g., "Claude Code", "Cursor"). Shown in the session info panel only. Only meaningful on create_deliberation.

Sessions opened through MCP are tagged source: "mcp" server-side. Neither field is auto-populated — pass whatever your client wants to display.

MCP follows the same session-level consent model as the REST API, with one agent-friendly difference: if a free-tier MCP caller sends improvement_consent: false, mumo still creates the session as included and returns this disclosure:

Free-tier sessions are included in platform improvement per accepted terms. Exclusion request was not applied.

Paid users can use improvement_consent: false to exclude a new session. Consent is resolved at session creation and returned on get_session as:

"improvement_consent": {
  "enabled": true,
  "reason": "free_tier_terms",
  "requested": null,
  "disclosure": null
}

Errors#

The MCP server returns errors as text content with a structured prefix. Common cases:

  • credit_exhausted — wallet balance is below the request's per-model minimum. Body carries effective_balance_usd, free_usd, subscription_usd, refill_usd, per_model_minimum_usd, next_reset_at. Not retryable — top up (when paid tier ships) or wait for the 1st-of-month free-tier reset.
  • unknown_models — one or more requested model IDs aren't in the registry. Body carries unknown_models: string[] and models_requested: string[]. Call list_models to enumerate valid IDs. Preflight check; no credit debited.
  • ineligible_models — one or more requested model IDs were disabled by the account in /settings/models. Body carries ineligible_models: string[]. Re-enable them in the dashboard or omit them from your request.
  • insufficient_active_models — caller omitted models and the curated default panel couldn't produce ≥2 picks against the account's enabled set. Body carries collapsed_buckets: number[]. Enable more models at /settings/models.
  • session_busy — a round is in flight. Wait, then retry.
  • daily_limit_reached — round budget exhausted. Returns resets_at.
  • not_found — session ID doesn't exist or isn't yours.

The full REST error reference is in the API docs.


Naming philosophy#

MCP uses canonical API field names where the same concept crosses the boundary (session_id, round_id, claim_map, snippet type, etc.), but it is not a field-for-field mirror of REST. REST-only controls like autonomous moderation are intentionally omitted from MCP so agents follow one clear loop: create, wait, read, append or stop.

One convention to know: snippet types (type field on append_round snippets and claim_map.claims[].positions[]) are always UPPERCASEKEEP, EXPLORE, CHALLENGE, CORE, SHIFT.

Full internal mapping is in docs/CONVENTIONS.md — useful if you're contributing or auditing.


See also#