mumo REST API
Run structured multi-model deliberations on demand. Send a prompt; get back a session containing every model's response, the cross-model claim map, and the round-level distill.
This is the consumer reference. For agent-runtime use (Claude Code, Cursor, etc.), see the MCP server docs.
Async by default (Trello #93)
As of 2026-04, POST /api/deliberation and POST /api/sessions/:id/rounds return a 202 Accepted ack in <500ms after the round row commits and budget is debited — they do not wait for model execution. Model work runs in the background; callers poll progress_url for terminal state.
What the ack looks like:
{
"session_id": "abc-123",
"round_index": 0,
"status": "processing",
"idempotency_key": "auto-gen-uuid-if-omitted",
"client_request_id": null,
"poll_after_ms": 5000,
"progress_url": "/api/sessions/abc-123/progress",
"progress_version": 0
}
Why: a client timeout no longer produces server/client state divergence. If the ack reached you, the round committed; if it didn't, your idempotency key replay resolves the uncertainty.
?wait=true compatibility shim. REST-only. Appending ?wait=true to the create/append URL makes the endpoint block until terminal state or 290s (whichever first), preserving the pre-2026-04 response shape. This is a migration aid, deprecated (see "Deprecation timeline" below). MCP ignores ?wait=true; autonomous mode (with moderator_model) rejects it with 400 wait_unsupported because orchestration loops span multiple rounds.
Idempotency
- Scope:
(account, endpoint, Idempotency-Key). Same key can be used on bothcreate_deliberationandappend_roundwithout colliding. - Request fingerprint: the server hashes a canonical subset of your body (prompt + snippets[] order-sensitive + model set sorted + session_mode + moderator_name + reference + rounds + moderator_model), NFC-normalized. Same key + same semantic body = replay; same key + different semantic body =
409 idempotency_conflictwithoriginal_request_fingerprintin the response for caller-side diff. - Auto-generation: MCP always sends one (its adapter derives a stable key from tool+args). REST: if you omit the
Idempotency-Keyheader, the server generates a UUID and echoes it in the ack'sidempotency_keyfield. Persist the echoed key if you want retry safety across transport failures. - Retry during refund: if a round's
refund_status='pending'at replay time, the response includesstatus: "processing_refund"withretry_after_msinstead of the original terminal. Once the refund credits, replays reflect the terminal failure. - TTL: 24h rolling window.
- Novel error patterns: unclassified failures (classifier fell through to
internal_error) do NOT auto-credit refunds. They write the failure fact and sitrefund_status='pending'for admin review. Only canonical codes (see "Failure codes" below) auto-credit.
Refund lifecycle
When a round fails catastrophically (all providers error), the ledger emits a failure fact + credits back your budget atomically. The rounds[].refund_status lifecycle you'll see on the progress endpoint:
none— happy path, no failure.pending— worker emitted the failure fact (failure_event_atset); ledger is about to credit.credited— refund is in;refund_credited_atstamped. Budget restored.not_applicable— partial-success round (≥1 model returned output); round counted as "delivered."
SLO: p99 latency from failure_event_at to refund_credited_at is < 60s.
Failure codes
Canonical failure_code values surfaced on the progress endpoint and in refund-conflict responses:
| Code | Meaning |
|---|---|
model_provider_rate_limit | Provider returned 429 / rate-limit |
model_provider_outage | Provider returned 5xx or was unreachable |
model_provider_oom | Provider returned out-of-memory |
model_output_malformed | Response couldn't be parsed |
model_timeout | Provider call exceeded the deadline |
all_providers_failed | Every participant failed (composite) — fires refund |
dependency_timeout | Upstream (e.g. Brave Search) timed out |
dependency_outage | Upstream dependency 5xx |
dependency_malformed_response | Upstream returned unparseable data |
stuck_reconciled | Reconciliation cron detected a stuck round and emitted synthetic failure |
internal_error | Unclassified — held for admin review, does NOT auto-credit |
test_forced_failure | Admin-only test hook |
Deprecation timeline for ?wait=true
- Day 0–90 from GA: full support,
Deprecation: true+Sunset+Linkresponse headers on every?wait=trueresponse. - Day 90–120: advisory window; headers remain.
- Day 120+: returns
410 Gonewith a pointer to the async-polling pattern. Enforced viaWAIT_ENFORCE_410env flag.
Quickstart
- Get a key at mumo.chat/settings/api-keys. Keys begin with
mmo_live_. - Send a prompt:
curl https://mumo.chat/api/deliberation \
-H "Authorization: Bearer mmo_live_…" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Should we use Postgres or MongoDB for an event store?",
"rounds": 1
}'
You'll get back a session object. With rounds: 1 (single-shot) the status is ready immediately and the round's full artifact stack is in the response. Here's a redacted example:
{
"id": "f5731fe5-27ce-4f82-b8ce-868e72ff8bb9",
"status": "ready",
"mode": "single_shot",
"active_models": ["claude-opus-4-6", "gpt-5.4", "grok-4-20-reasoning"],
"rounds": [
{
"index": 0,
"completion_state": "complete",
"responses": [
{ "model": "claude-opus-4-6", "text": "...", "snippets": [...] },
{ "model": "gpt-5.4", "text": "...", "snippets": [...] },
{ "model": "grok-4-20-reasoning", "text": "...", "snippets": [...] }
],
"claim_map": {
"claims": [
{
"quote": "Postgres' append-only WAL plus JSONB columns gives you...",
"originator": "claude-opus-4-6",
"reaction_count": 2,
"positions": [
{ "model": "gpt-5.4", "type": "KEEP", "comment": "Agreed — JSONB lets you..." },
{ "model": "grok-4-20-reasoning", "type": "CHALLENGE", "comment": "But what about field-level migrations 18 months in?" }
]
}
]
},
"distill": {
"key_finding": "All three converged on Postgres + JSONB for the event store, but split on schema rigidity.",
"agreements": ["Append-only access pattern fits Postgres' WAL.", "JSONB lets you start schemaless and tighten later."],
"disagreements": ["GPT prefers schema-on-write at the application layer; Grok argues schemaless is fine because the schema is the event type."],
"impactful_quote": {
"text": "Postgres' append-only WAL plus JSONB columns gives you...",
"model": "claude-opus-4-6",
"why": "Reframed the choice from 'doc store vs relational' to 'use the right Postgres feature for the access pattern.'"
},
"open_questions": ["Field-level migrations 18 months in if schema lives in JSONB."],
"narrative": "All three models converged on Postgres for the event store..."
}
}
],
"summary": null,
"confidence_disclaimer": "Confidence scores (0–1) on claims and snippet comments are self-reported..."
}
Read the artifact stack section below for what each piece is for.
Authentication
Authorization: Bearer mmo_live_…
Keys are minted at /settings/api-keys. Each key is hashed at rest; copy the secret on creation — it's never shown again.
Calls without a valid key return 401 Unauthorized. Calls to a model your tier can't access return 403 with the available alternatives in the body.
Two modes
The same POST /api/deliberation endpoint serves both modes. Whether you supply a moderator_model decides which one runs.
Autonomous — fire and poll
"I want to fire-and-forget a complex deliberation and come back when it's done."
Provide a moderator_model and a rounds cap. mumo runs the full multi-round arc unattended. The endpoint returns immediately with status: "streaming". Poll GET /api/sessions/:id until status: "ready".
{
"prompt": "...",
"rounds": 4,
"moderator_model": "claude-opus-4-6"
}
Remote — you drive the rounds
"I want to drive each round myself, steering with snippets between them."
Omit moderator_model. The endpoint runs round 1 synchronously and returns with status: "ready" and the round's full artifact stack. Read the claim_map, then call POST /api/sessions/:id/rounds with steering snippets to add the next round. Repeat as long as you want — there's no preset cap on remote sessions.
{ "prompt": "..." }
Single-call use case: if you only want one round of three opinions and no follow-up, just don't call append_round. The first response already contains the full artifact stack — there's no separate "single-shot" code path. The session's mode field will be "single_shot" (a label distinguishing intent — set by the server when rounds: 1 is the request) but the engine and the response shape are identical to remote.
The rounds field is only meaningful in autonomous mode, where it caps how many rounds the AI moderator runs (max 14). In remote mode, the field is recorded for diagnostic purposes but doesn't bound the session — append_round works as long as the session exists.
Distill artifacts (opt-in)
Every round always produces the raw per-model responses and a cross-model claim map. The finalizer can optionally also produce two distill artifacts:
brief— structured JSON:narrative,agreements,disagreements, andcontinuation.recommendation(stop / continue / explore). Written torounds[].distillon the session response.summary— a streaming narrative prose recap for the round. Rendered into the web-UI timeline between rounds; available onrounds[].summary.
Both default OFF on API and MCP. Programmatic consumers get the same value from responses + claim map without the extra LLM-call tax. Opt in via the distill param on POST /api/deliberation:
{ "prompt": "...", "distill": "both" }
Accepted values:
"off"— both disabled (default)"brief"— only the structured JSON brief"summary"— only the streaming narrative"both"— both enabled{ "brief": boolean, "summary": boolean }— fine-grained control
If your agent is surfacing a narrative back to a human, pass "summary" or "both" on create. The defaults are tuned for programmatic consumption where responses + claim map are the highest-signal artifacts; the human-readable fields (summary prose, brief narrative) are generated lazily.
The continuation.recommendation field lives inside the brief. Request "brief" or "both" if you specifically want that stop/continue/explore signal.
The setting is session-scoped — pinned on session create and inherited by every append_round call. Admins can tune channel defaults via system_config (distill_default_api_brief, distill_default_api_summary).
The session response
Every GET /api/sessions/:id response (and the response from POST /api/deliberation for single-shot mode) carries these top-level fields:
| Field | Notes |
|---|---|
id | Session UUID. |
status | streaming | processing | ready | failed. See "Status flow" below. |
mode | autonomous | remote | single_shot. The first two reflect the engine; single_shot is a label the server applies when rounds: 1 to signal "no follow-up intended" — it shares the remote engine. |
active_models | Model IDs participating in this session. |
moderator_model | Set for autonomous sessions; null otherwise. |
moderator_name, application | Optional identity metadata. |
model_metadata | { [model_id]: { display_name, provider } }. |
created_at, estimated_ready_at | Timestamps. |
total_usage | Aggregated tokens_in / tokens_out across the session. |
rounds | Array of round objects (see below). |
summary | Session-level editorial. Null until generated for multi-round sessions. |
confidence_disclaimer | Verbatim advisory string. Surface alongside any displayed confidence scores. |
Each round in rounds[] carries: index, prompt, completion_state (complete | partial_failure), responses, failed_models, claim_map, distill.
The four per-round artifacts:
-
responses[].text— raw prose from each model. -
responses[].snippets[]— model-emitted reactions (typedKEEP/CHALLENGE/etc, with verbatim quotes from peers and optional commentary). -
claim_map.claims[]— verbatim claims that ≥2 models reacted to, with each reactor'sposition(type + commentary). The highest-signal artifact for understanding agreement and disagreement. -
distill— round-level structured synthesis:key_finding— one sentence on what shifted in this round.string | null.agreements[]— short statements of where the panel converged.string[] | null.disagreements[]— each entry names the tension and the sides.string[] | null.impactful_quote—{ text, model, why } | null. The single quote that mattered most.open_questions[]— forward-looking threads the panel left unresolved.string[] | null.narrative— magazine-style prose (2–3 paragraphs). Always present whendistillis non-null.continuation— distill's judgment of whether another round is worth running.{ convergence, recommendation, reasoning } | null. Details below.
Nullability rule: the structured fields are
nullwhen distill ran on the legacy/unstructured path or when an upstream parse failed. They are never empty strings or empty arrays — those values are reserved for "absent." When you seeagreements: [], the model produced no agreements; when you seeagreements: null, structured distill wasn't computed for this round.
Continuation: deciding whether to run another round
distill.continuation is the single most useful field for agents driving remote-mode deliberations. It collapses "should I call append_round or stop here?" into one signal you can read straight off the response.
"continuation": {
"convergence": 0.78, // 0.0–1.0; how consolidated the panel is on a stable answer
"recommendation": "stop", // "stop" | "continue" | "explore"
"reasoning": "Three-of-three agreement on the structural diagnosis; remaining disagreement is on implementation detail unlikely to shift with further rounds."
}
convergenceis trajectory-aware — it factors in the rounds before this one. A round that opens new productive disagreement may legitimately drop convergence vs the prior round. That is not a regression — it's a signal that the deliberation has uncovered a new dimension worth examining.recommendationvalues:"stop"— panel has converged enough that another round would be churn."continue"— unresolved tensions worth resolving in another round."explore"— productive new territory has emerged that another round would profitably deepen, even if convergence is lower. Use this for opportunity, not uncertainty.
reasoningcites this round's specific evidence (which agreements held, which positions shifted) so the recommendation is auditable.
How to use it: switch on recommendation for the simple case, threshold on convergence for custom logic, surface reasoning to humans deciding whether to trust the signal. The recommendation is the distill model's opinion, not a guarantee — same epistemic status as key_finding.
Most agents only need claim_map + distill to decide whether to continue or stop. The claim_map.claims[].positions[] array is what tells you where the panel agrees, where they're stuck, and which model said what.
The session-level summary field carries the final editorial across the whole session (surface, agreed, split, open blocks plus anchor_quote and og_quote). It's only populated for completed multi-round sessions.
Endpoints
POST /api/deliberation
Create a session. Body:
| Field | Type | Notes |
|---|---|---|
prompt | string | The question or topic. Required. |
reference | string | Optional spec, doc, or design injected as shared context. |
models | string[] | 2–3 model IDs. Defaults to platform selection. Call GET /api/models to enumerate. |
rounds | int | 1–14. Default 3. Only meaningful for autonomous mode (caps the AI moderator's arc). In remote mode, set it to whatever you like — append_round is unbounded by it. rounds: 1 causes the server to label the session single_shot (a hint that you don't intend to append). |
moderator_model | string | Model ID to moderate autonomously. Omit for remote mode. |
moderator_name | string | Display name for the steering identity (≤100 chars). Surfaces in the published transcript. |
application | string | Display name of your client (≤100 chars). Surfaces in the session info panel. |
Returns a session object (see Quickstart for an example shape). For autonomous sessions, status starts as streaming and transitions through processing to ready.
Idempotency: pass Idempotency-Key: <stable-string> to make retries safe. Same key + same body returns the cached response; same key + different body returns 409 idempotency_conflict.
POST /api/sessions/:id/rounds
Append a round to a remote-mode session. Body:
{
"prompt": "Focus on the pricing mechanism, not positioning.",
"snippets": [
{
"type": "CHALLENGE",
"quote": "Per-seat pricing assumes teams of >10.",
"quoted_model": "gpt-5.4",
"comment": "Most enterprise pilots start at 3–5."
},
{
"type": "KEEP",
"quote": "Usage-based pricing aligns incentives.",
"quoted_model": "claude-opus-4-6"
}
]
}
snippets is optional but high-signal — it's how you steer attention round-to-round.
Snippet types:
- KEEP — this point is strong; preserve it
- EXPLORE — dig deeper here
- CHALLENGE — push back on this claim
- CORE — load-bearing; build on it
- SHIFT — this reframes the question
Quotes must be verbatim from a prior round's response. quoted_model is the model ID that originated the quote.
Idempotency-Key is strongly recommended on this endpoint. Round-append duplication corrupts deliberation history.
Errors:
409 session_busy— a round is currently streaming or processing. Retry after a short delay with the sameIdempotency-Key.403 credit_exhausted— your wallet balance can't cover the next round's per-model minimum. Body includeseffective_balance_usd,free_usd,subscription_usd,refill_usd,per_model_minimum_usd,next_reset_at. Free-tier balance resets on the 1st of each month (UTC).
GET /api/sessions/:id
Fetch the full state of a session — all rounds, responses, snippets, claim maps, distills, and the editorial summary if present.
For autonomous sessions, poll this until status === "ready". For single-shot or post-append flows, the response is fresh on every call.
GET /api/sessions
List your sessions.
| Query | Values |
|---|---|
mode | autonomous | remote | single_shot (label-only variant of remote) |
status | ready | streaming |
limit | 1–200 (default 7) |
offset | pagination |
Returns a lightweight list (no response bodies). Use GET /api/sessions/:id for full content.
GET /api/models
List the models available to your account tier. Returns id, provider, display name, context window, max output tokens, and pricing (per-million tokens, separated into input / cached input / output / cache write where applicable).
GET /api/defaults
Discover platform defaults and your current budget.
{
"default_models": ["…", "…", "…"],
"default_rounds": 3,
"default_moderator_model": "…",
"remaining_rounds_today": 5,
"round_limit": 10,
"resets_at": "2026-04-15T04:00:00Z"
}
POST /api/sessions/:id/summary
Generate (or return cached) editorial summary for a session. Idempotent — returns the cached summary if one already exists.
GET /api/health
Returns { "status": "ok", "version": "v1" }. No auth required.
Status flow
streaming → processing → ready
↘
failed
- streaming — at least one model in the most recent round is actively responding.
- processing — all model responses landed; post-processing (snippet extraction, claim map, distill) is in flight.
- ready — fully complete; safe to call
append_roundor read final artifacts. - failed — terminal error.
For autonomous mode: poll GET /api/sessions/:id until status === "ready".
Errors
All non-2xx responses return JSON:
{
"error": "session_busy",
"message": "Session has a round in progress — poll and retry with same Idempotency-Key",
"retryable": true
}
| Code | HTTP | Retryable | What to do |
|---|---|---|---|
unauthorized | 401 | no | Bad or missing bearer token. |
credit_exhausted | 403 | no | Wallet balance below the request's per-model minimum. Body includes bucket breakdown + next_reset_at. Top up (paid) or wait for the 1st-of-month free-tier reset. |
forbidden | 403 | no | Feature not available on your tier (e.g., moderator_model / autonomous mode requires paid). |
not_found | 404 | no | Session ID doesn't exist or isn't yours. |
idempotency_conflict | 409 | no | Same key reused with a different body. Use a new key. |
session_busy | 409 | yes | Another round is in flight. Retry with the same Idempotency-Key. |
internal_error | 500 | yes | Transient. Retry with the same Idempotency-Key. |
Confidence scores
When models emit self-reported confidence (via {{C=0.8}}…{{/C}} tags in their prose, or on snippet commentary), those scores surface on responses:
responses[].claim_confidence: [{ claim_text, confidence_score }]— per-claim scores extracted from prose. Tags are stripped fromtextbefore return.responses[].snippets[].comment_confidence: number | nullconfidence_disclaimer: string— verbatim advisory at the top of every session response.
These are self-reported and only meaningful relative to the same model's other claims. They are not calibrated across models. If you display them, surface the disclaimer too.
Naming philosophy
Field names you see in API requests and responses are the canonical contract — they're the names mumo guarantees to consumers. Internal type names and DB columns may differ; the serializer maps between them. We follow a contract-first principle and one-way mapping (internal → API), with the full mapping documented in docs/CONVENTIONS.md.
One convention worth knowing up front: snippet types (type field) are always UPPERCASE at the API boundary — KEEP, EXPLORE, CHALLENGE, CORE, SHIFT. They're lowercase only in internal storage.
We don't rename API fields casually. Any boundary rename comes with a deprecation period that accepts the old name as alias.