mumo MCP Server
Run structured multi-model deliberations directly from MCP-compatible clients such as Claude Code, Cursor, VS Code, Codex, Windsurf, Claude Desktop, and others.
The MCP server uses the same auth and session artifacts as the REST API, but it is an opinionated agent-moderated surface: your agent starts a panel round, waits for the result, reads the responses and claim map, then decides whether to append another round. Use MCP if you want deliberation as tool calls; use the REST API if you're integrating from your own backend.
New to mumo MCP? Start with the generic install guide. It walks you through adding the server config, generating your API key, and verifying the tools. This page goes deeper on client-specific examples, tool behavior, and response fields.
Setup#
Claude Code#
The mumo plugin ships through Anthropic's official marketplace and bundles the MCP server config plus an auto-triggering skill. Your API key is stored in your system keychain (no env-var export required). Full walkthrough at mumo.chat/install/claude-code.
Codex#
The mumo Codex plugin bundles the MCP server config and an auto-triggering skill. Register the marketplace, then install from the Codex desktop app (Plugins → publisher dropdown → mumo → + → Install mumo). Auth via MUMO_API_KEY env var (launchctl setenv on macOS for GUI launches). Full walkthrough at mumo.chat/install/codex.
Cursor#
The mumo Cursor plugin bundles the MCP server, an auto-triggering skill, and a Cursor rule. Auth via MUMO_API_KEY env var (launchctl setenv on macOS for GUI launches). Full walkthrough at mumo.chat/install/cursor.
Invocation. Cursor's rule system treats plugin rules and skills as soft priors — auto-trigger on contested decisions is best-effort. For reliable routing, name
mumoexplicitly in your prompt: "ask mumo about…", "run this by a mumo panel", "get me a second opinion from mumo."
Hermes Agent#
The mumo Hermes skill clones into ~/.hermes/skills/<category>/mumo/ and ships with the canonical SKILL.md, four cognitive-shape playbooks, and reference docs. Merge the bundled config/mumo.yaml into ~/.hermes/config.yaml under mcp_servers, restart Hermes (the /reload-mcp slash command is unreliable across versions), and the seven mumo tools surface as mcp_mumo_*. Full walkthrough at mumo.chat/install/hermes.
OpenClaw#
The mumo OpenClaw skill installs via ClawHub (openclaw skills install mumo — published listing at clawhub.ai/ericatmumo/mumo) or by cloning the source repo into ~/.openclaw/skills/mumo/. Either path lands the canonical SKILL.md, four cognitive-shape playbooks, and reference docs. Register the MCP server with openclaw mcp set mumo '<json>' (Streamable HTTP, literal Bearer in headers — no env-var pointer), restart OpenClaw, and the seven mumo tools surface as mumo__create_deliberation, mumo__wait_for_round, etc. (double-underscore prefix). Full walkthrough at mumo.chat/install/openclaw.
VS Code (GitHub Copilot)#
The mumo extension installs from the Visual Studio Marketplace and stores your key in SecretStorage (macOS Keychain / Windows Credential Manager / Linux libsecret). A Get Started walkthrough opens on first install. Full walkthrough at mumo.chat/install/vs-code.
Invocation. VS Code's Copilot doesn't have a
SKILL.mddiscovery mechanism the way Claude Code and Cursor do, so the auto-triggering skill from those plugins doesn't apply. Invoke mumo explicitly in Agent chat: "ask mumo about…", "run this by a mumo panel", "get me a second opinion from mumo."
Other clients (Windsurf, Claude Desktop, Cline, Zed, …)#
Any MCP-compatible client that supports HTTP transport with a custom Authorization header will work.
- Add a remote MCP server named
mumoin your client. - Point it at
https://mumo.chat/api/mcpwith an Authorization header placeholder. - Generate a platform API key at mumo.chat/settings/api-keys. Keys begin with
mmo_live_. - Replace the placeholder with your real key, reload your client, and verify the mumo tools appear.
- Ask for your first deliberation.
Most clients accept a generic mcp.json shape:
{
"mcpServers": {
"mumo": {
"url": "https://mumo.chat/api/mcp",
"headers": {
"Authorization": "Bearer mmo_live_YOUR_KEY_HERE"
}
}
}
}
Some clients use different keys (servers instead of mcpServers), TOML instead of JSON, or nested header settings — check your client's docs. For Claude Desktop, this file lives at ~/Library/Application Support/Claude/claude_desktop_config.json on macOS.
Your first call#
The simplest way to verify everything is wired up is to fire off a round. From your agent:
Create a deliberation: "Should we use Postgres or MongoDB for an event store?"
The agent invokes create_deliberation. The call returns in under a second with a session id, round id, display-friendly round number, and progress URL — but the models are still running. Ask the agent to call wait_for_round with the returned session_id and round_id (typically 15–120s depending on model choice). Then:
Session id abc-123, status ready.
3 models responded. claim_map.claims[] has 4 entries with cross-model positions.
The loop: tool call (fast ack) → wait_for_round → responses + claim map → agent reads → append or stop.
Tools#
Seven tools. MCP keeps canonical API field names where the same concept crosses the boundary, but omits REST-only controls that do not belong in agent-moderated sessions.
| Tool | Purpose |
|---|---|
create_deliberation | Start a new deliberation. |
append_round | Add a follow-up round with steering snippets. |
wait_for_round | Wait for a round to finish, then return the full session. |
get_session | Fetch full session state. |
list_sessions | List your sessions, optionally filtered. |
list_models | See models available to your tier. |
get_credit | Check current credit wallet balance. |
create_deliberation#
Start a new agent-moderated session. We'll return an ack immediately and run the models in the background. Call wait_for_round to read the round's artifacts once it reports complete. You'll receive raw responses + a cross-model claim map. If one round gives you everything you need, you're done. If you want more, call append_round with the existing session_id. Participating models will automatically get full context and you'll get the benefit of input caching when available on subsequent rounds.
Inputs:
prompt(required) — the question or topic.reference— optional doc, spec, or design. Injected as shared context for all models.models— array of 2–3 model IDs. Defaults to platform selection. Calllist_modelsto enumerate.moderator_name— display name for the steering identity (e.g., your agent's name). Surfaces in the published transcript. Also important for optimizing participating model selection when you don't specify models.application— display name of the client (e.g.,"Claude Code"). Surfaces in the session info panel.improvement_consent— optional per-session platform-improvement consent request. Paid users may passtrueorfalse; omitted uses the account default. Free-tier sessions are consent-inclusive under accepted terms, sofalseis coerced to include and the ack/session text includes a disclosure.recap_round— optional boolean (defaultfalse). Whentrue, generates a structured per-round summary (round_recap) when round 0 completes. See Recap artifacts for the shape and pricing. Note:recap_sessionis intentionally NOT accepted oncreate_deliberation— synthesis requires ≥ 2 rounds, so it would degenerate to a round-recap-only behavior here. Setrecap_session=trueon a laterappend_roundcall when you want session-level synthesis.
Returns: an ack — session_id, round_id, round_index (0-based API value; display as round round_index + 1), progress_url, and polling metadata:
{
"session_id": "abc-123",
"round_id": "round-456",
"round_index": 0,
"status": "processing",
"progress_url": "/api/sessions/abc-123/progress",
"poll_after_ms": 5000,
"progress_version": 0
}
Actual debits settle as models complete, reflected in the round's debits[] on get_session. Use get_credit when you need a fresh wallet read. See Credit wallet.
Models run in the background; read the session (rounds, claim maps) via wait_for_round once the round is complete.
append_round#
Add a follow-up round. Use after reading wait_for_round or get_session to steer the next round.
Inputs:
session_id(required) — session ID from a priorcreate_deliberation.prompt(required) — the steering prompt for this round.snippets— array of typed cross-model forwards. Each:type—KEEP|EXPLORE|CHALLENGE|CORE|SHIFTquote— verbatim quote from a prior round's responsequoted_model— model ID that said itcomment— optional commentary explaining why you're forwarding it
moderator_name— supply only when the steering identity changes mid-session.recap_round— optional boolean (defaultfalse). Whentrue, generates around_recapfor this round.recap_session— optional boolean (defaultfalse). Whentrue, generatessession_synthesisover the in-flight round-recap set when this round completes. Cascade behavior: triggersround_recapgeneration for any prior rounds that don't already have one. The trigger round's recap is included automatically; you don't also need to setrecap_round=true. See Recap artifacts for the cascade semantics and pricing.
Snippets are the highest-signal way to direct attention. Models see them as curated forwards from the moderator, with the snippet type shaping how they respond.
Returns: same ack shape as create_deliberation. Call wait_for_round with the returned session_id and round_id for the new round's artifacts and its debits[].
Errors:
- The session must be in
status: "ready". If it'sstreamingorprocessing, wait. Iffailed, you can't continue. - Round-append duplication corrupts deliberation history — the MCP server passes a derived idempotency key automatically based on the call args.
wait_for_round#
Wait for a round to finish, then return the full session. This is the intended follow-up to create_deliberation and append_round.
Under the hood, the tool polls the lightweight progress endpoint and fetches the full session exactly once after the target round reaches terminal state. Use this instead of repeatedly calling get_session while models are still running.
Inputs:
session_id(required) — session ID fromcreate_deliberationorappend_round.round_id— preferred target round selector from the ack.round_index— optional 0-based fallback if you do not haveround_id.timeout_seconds— max wait before returning a timeout response (default 40; max 120). On timeout, the response carriesrecommended_client_action: "poll_again"— re-call to continue waiting.poll_after_ms— initial polling interval (default 1000; backs off up to 5000).
On timeout, the round keeps running. Call wait_for_round again, or call get_session if you specifically need a partial read.
Structured output. wait_for_round returns both human-readable text (content[0].text — full session markdown on terminal, short prose summary on timeout) and protocol-native structured data (structuredContent) that lets agents branch on round state without parsing prose. Clients that don't support structuredContent still see the existing text channel unchanged.
{
"session_id": "ses_...",
"round_id": "rnd_...",
"round_index": 1,
"round_status": "partial_failure",
"is_terminal": true,
"is_usable": true,
"completed_models": [
{ "model": "claude-opus-4-7", "tokens_in": 4120, "tokens_out": 891, "finish_reason": "end_turn", "is_partial": false }
],
"in_progress_models": [],
"failed_models": [
{ "model": "qwen-3.6-plus", "error": "stream ended without done event", "message": "...", "error_code": "stream_ended_without_final_marker", "partial_text": "...", "partial_text_length": 1804 }
],
"recommended_client_action": "proceed_with_partial_result"
}
Handling partial rounds
wait_for_round always returns HTTP 200 with explicit semantic state. Branch on round_status and recommended_client_action rather than parsing prose or pattern-matching on error strings.
round_status values:
complete— every target model produced a final response. Callget_sessionfor full content.partial_failure— at least one model produced a final response, at least one failed.is_usable: true. Read successful answers fromcompleted_models[]; read attribution fromfailed_models[].in_progress— round is still running. Callwait_for_roundagain with the same arguments;recommended_client_action: "poll_again".failed— every target model failed and the round produced no usable output. Inspectrecommended_client_action:"retry"when at least one failure'serror_codeis in the transient set (rate_limit,provider_error,internal_deadline_reached,deadline_expired,stream_interrupted,stream_ended_without_final_marker),"abandon"otherwise. Codes that look like failures but aren't safely retryable —pre_stream_provider_error(covers auth/malformed 4xx as well as 5xx),provider_auth_failure,pre_stream_failure,max_retries_exceeded— map to abandon. Each entry infailed_models[]carries a canonicalerror_code(ornullon legacy rows) alongside the free-texterror/message; switch onerror_coderather than parsing prose.
recommended_client_action contract:
| Action | Meaning |
|---|---|
proceed_with_complete_result | Round is complete; read responses from the full session. |
proceed_with_partial_result | At least one model succeeded; partial result is usable. |
poll_again | Round still in flight; call wait_for_round again. |
retry | Round failed but the failure looks transient; safe to call append_round (or a fresh create_deliberation) again. |
abandon | Round failed non-transiently (auth, malformed input, etc.); retrying is unlikely to help. |
The action is server-derived guidance, not authority. Sophisticated agents can apply their own logic over the raw fields (round_status, is_terminal, is_usable, the per-model arrays).
Polling guidance. wait_for_round blocks server-side with internal polling. Default timeout is 40s, max 120s — the upper bound matches the empirical MCP transport ceiling so the schema enforces the safe range; the 40s default leaves headroom against shorter host-side request ceilings observed in some clients. For rounds longer than the timeout, the response carries recommended_client_action: "poll_again" and the agent re-calls. The /api/sessions/{id}/progress endpoint supports If-None-Match/ETag for cheap no-change polls if you'd rather poll directly.
Partial-text handling. Partial text appears in two distinct places depending on outcome:
- Successful-but-truncated responses stay in
completed_models[](androunds[].responses[]onget_session) withis_partial: trueand a populatedfinish_reason. The model produced output and the call reacheddone, but the provider signaled truncation. Common cause:finish_reason: "max_tokens"(or"length","MAX_TOKENS"per provider vocabulary) — the output cap was hit. Treat as a partial answer; consider asking the user whether to extend. - Error rows that managed to produce text appear in
failed_models[]withpartial_textandpartial_text_lengthpopulated, NOT incompleted_models[]/responses[]. Those collections are reserved for successful answers. Common causes:provider_error(post-first-byte SDK failure),stream_ended_without_final_marker(SDK closed cleanly without adoneevent despite emitting bytes),internal_deadline_reached(the 150s in-band deadline fired while the model was still producing output).
In both surfaces, the text is "what the model managed to produce before it stopped" — useful for diagnostic recovery, sometimes usable as a partial answer.
get_session#
Fetch the full state of a session — all rounds, responses, snippets, claim maps, and metadata. Use this to read or re-read completed state; while a round is running, prefer wait_for_round.
The most useful field for downstream decisions: rounds[].claim_map.claims[]. Each claim has:
quote— the verbatim claimoriginator— model that said itpositions[]— the cross-model reactions, each withmodel,type(KEEP/CHALLENGE/etc), andcommentreaction_count— how many models reacted
This is the highest-signal view of where the panel agrees and where they're stuck.
After a round completes, the session response carries settled cost rollups so agents can self-evaluate "was this worth it?":
total_cost_usd— top-level ground-truth cost for the active session, sourced from thev_llm_spendledger. Covers every billable bucket (deliberation + moderator + recap (round_recap + session_synthesis) + snippet extraction + editorial + search) — the same totals the admin dashboard reports. Markup-exclusive (distinct from wallet debits, which are markup-included).rounds[].cost_usd— same source, attributed per completed round. Useful for distinguishing rounds that delivered value from rounds that were convergence noise. During an in-flight round this may be0or incomplete;wait_for_roundreturns it after the target round is terminal.
Each round also carries a debits[] array — one entry per model call that settled to the wallet:
{
"debits": [
{ "transaction_id": "txn_01h9x2p7k...", "model": "claude-opus-4-6", "amount_usd": 0.11, "settled_at": "2026-04-24T21:15:32Z" }
]
}
amount_usd is markup-included (what the user paid). Sum of debits[].amount_usd across all newly completed rounds reconciles exactly with the wallet delta visible through get_credit.
list_sessions#
List the caller's sessions, optionally filtered.
Inputs:
status—ready|streaminglimit— 1–200 (default 7)offset— pagination
Returns a lightweight list (no response bodies). Useful for agents managing concurrent sessions — status: "ready" finds sessions awaiting your next round.
list_models#
Returns model id, provider, display name, context window, max output tokens, and pricing. Per-model available + unavailable_reason reflects both tier entitlement and credit-gate state — a model with available: false and unavailable_reason: "credit_exhausted" means your wallet balance is below that model's pricing.minimum_usd. Call before create_deliberation if the user wants specific models.
get_credit#
Fetch the caller's full wallet resource. Returns:
{
"effective_balance_usd": 1.42,
"buckets": {
"free": {
"balance_usd": 1.42,
"monthly_grant_usd": 1.50,
"resets_at": "2026-05-01T00:00:00Z"
},
"subscription": {
"balance_usd": 0,
"rollover_cap_usd": 30.00,
"subscription_status": null
},
"refill": {
"balance_usd": 0,
"auto_refill_enabled": false
}
},
"per_model_minimum_usd_default": 0.05,
"debit_order": ["free", "subscription", "refill"]
}
Field semantics:
effective_balance_usd— sum across buckets; markup-included.buckets.free— monthly grant + next reset boundary.buckets.subscription— paid-tier balance + rollover cap + status ("active" | "past_due" | "cancelled" | "expired" | null).buckets.refill— auto-refill top-up balance + enabled flag. Whenauto_refill_enabled: true, two additional fields appear:auto_refill_threshold_usd(trigger level) andauto_refill_amount_usd(top-up size).debit_order— FIFO bucket drain sequence.
Use get_credit for wallet state (dashboards, post-refund checks, pre-session reads outside the write-op flow). Write operations still perform their own affordability preflight and return credit_exhausted if the wallet cannot cover the requested models.
Credit wallet#
Every billable LLM call debits a dollar-denominated wallet. A call that would put the caller below the per-model minimum is rejected pre-flight with credit_exhausted — see Errors.
All USD amounts in MCP responses are markup-included (what the user actually paid). Raw provider cost is a platform-internal dimension and is not part of the consumer contract.
Three places wallet state surfaces:
get_credit— standalone wallet read with effective balance, bucket breakdown, and per-model minimum.get_sessionrounddebits[]— per-model transaction IDs + billed amounts, one entry per settled model call. Balance deltas reconcile toΣ debits[].amount_usd.
Balance is not embedded on get_session top-level — that call is high-volume during polling, and wallet state on a read-heavy path creates cache-coherency and semantic-coupling problems. Use get_credit when you need a fresh balance outside the write-op flow.
Snippet types#
The five buckets — KEEP / EXPLORE / CHALLENGE / CORE / SHIFT — are the steering primitive. Each carries a different framing into the next round's prompt:
| Bucket | Framing |
|---|---|
| KEEP | "This resonates with me." |
| EXPLORE | "Let's go deeper on this." |
| CHALLENGE | "I'm not sold on this." |
| CORE | "This is what it comes down to." |
| SHIFT | "This shifted my perspective." |
The model receiving a snippet sees the framing — it's not a neutral forward. Use them deliberately. Concrete example:
Round 1: GPT proposes per-seat pricing for the enterprise tier. Claude proposes usage-based. The agent reads round 1, decides per-seat is the weaker option for early pilots. Round 2 append: prompt "Resolve the pricing model. Pick one." plus two snippets: • CHALLENGE on GPT's "per-seat assumes teams of >10," with comment "most pilots start at 3–5" • KEEP on Claude's "usage-based aligns incentives" Models read those framings and converge on usage-based with a per-seat fallback for >25.
Recap artifacts#
Two optional booleans opt rounds in to recap generation. The flags are accepted asymmetrically across the two tools — a session synthesis only carries information beyond a round recap when there are ≥ 2 rounds to synthesize over. On a single-round session, "session synthesis" and "round recap" produce the same artifact in two coats of paint. So:
recap_round(defaultfalse) — accepted oncreate_deliberationANDappend_round. Generates a per-round summary (round_recap) when the round completes. Structured shape withtitle,tldr,agenda, andsections. Surfaces onget_sessiononce written.recap_session(defaultfalse) — accepted onappend_roundONLY (rejected oncreate_deliberationbecause round 0 alone can't produce a meaningful synthesis). Generates the session-level synthesis (session_synthesis:title,tldr,origin,arcs) over the in-flight round-recap set when this round completes. Triggers a cascade that backfillsround_recapfor any prior rounds that don't already have one. Settingrecap_session=trueimplicitly coversrecap_roundfor the trigger round.
Pricing. Recap and synthesis bill via the standard credit wallet but with 0 bps markup — at-cost passthrough. A typical 3-round cascade lands around ~$0.04 in Kimi inference cost. Setting recap_session=true without realizing the cascade can be surprising; the docs flag this so agents that flip the bit are deliberate about it.
Reading the artifacts. On get_session / wait_for_round:
rounds[].round_recap— populated for any round whose recap generation has completed.session_synthesis— populated once the cascade has produced a session-level synthesis. Absent until then.
Legacy distill. Pre-cutover sessions may carry a distill field on rounds; new sessions do not. Use recap_round / recap_session instead — legacy distill is disabled platform-wide.
Confidence scores#
When models emit self-reported confidence tags in their prose, those scores surface on responses:
rounds[].responses[].claim_confidence— per-claim scoresrounds[].responses[].snippets[].comment_confidence— per-snippet-comment scoresconfidence_disclaimer— short advisory string
These are self-reported and not calibrated across models. Surface the disclaimer if you display them to users.
Identity metadata#
Both create_deliberation and append_round accept two optional identity fields:
moderator_name— display name of who/what is steering the deliberation. Shown in the session info panel and replaces "You" attribution in the transcript. Also important for optimizing participating model selection when you don't specify models. Onappend_round, supply only when the steering entity changes (e.g., a human takes over from an agent) — otherwise the existing value is preserved.application— display name of the client driving the session (e.g.,"Claude Code","Cursor"). Shown in the session info panel only. Only meaningful oncreate_deliberation.
Sessions opened through MCP are tagged source: "mcp" server-side. Neither field is auto-populated — pass whatever your client wants to display.
Platform Improvement Consent#
MCP follows the same session-level consent model as the REST API, with one agent-friendly difference: if a free-tier MCP caller sends improvement_consent: false, mumo still creates the session as included and returns this disclosure:
Free-tier sessions are included in platform improvement per accepted terms. Exclusion request was not applied.
Paid users can use improvement_consent: false to exclude a new session. Consent is resolved at session creation and returned on get_session as:
"improvement_consent": {
"enabled": true,
"reason": "free_tier_terms",
"requested": null,
"disclosure": null
}
Errors#
The MCP server returns errors as text content with a structured prefix. Common cases:
credit_exhausted— wallet balance is below the request's per-model minimum. Body carrieseffective_balance_usd,free_usd,subscription_usd,refill_usd,per_model_minimum_usd,next_reset_at. Not retryable — top up (when paid tier ships) or wait for the 1st-of-month free-tier reset.unknown_models— one or more requested model IDs aren't in the registry. Body carriesunknown_models: string[]andmodels_requested: string[]. Calllist_modelsto enumerate valid IDs. Preflight check; no credit debited.ineligible_models— one or more requested model IDs were disabled by the account in/settings/models. Body carriesineligible_models: string[]. Re-enable them in the dashboard or omit them from your request.insufficient_active_models— caller omittedmodelsand the curated default panel couldn't produce ≥2 picks against the account's enabled set. Body carriescollapsed_buckets: number[]. Enable more models at/settings/models.session_busy— a round is in flight. Wait, then retry.daily_limit_reached— round budget exhausted. Returnsresets_at.not_found— session ID doesn't exist or isn't yours.
The full REST error reference is in the API docs.
Naming philosophy#
MCP uses canonical API field names where the same concept crosses the boundary (session_id, round_id, claim_map, snippet type, etc.), but it is not a field-for-field mirror of REST. REST-only controls like autonomous moderation are intentionally omitted from MCP so agents follow one clear loop: create, wait, read, append or stop.
One convention to know: snippet types (type field on append_round snippets and claim_map.claims[].positions[]) are always UPPERCASE — KEEP, EXPLORE, CHALLENGE, CORE, SHIFT.
Full internal mapping is in docs/CONVENTIONS.md — useful if you're contributing or auditing.
See also#
- Install mumo MCP — generic setup flow for any MCP client
- REST API reference — lower-level HTTP surface for backend integrations
- Get a key — create a platform API key
- mumo web app — browser-based deliberation