mumo REST API

Run structured multi-model deliberations on demand. Send a prompt; get back a session containing every model's response and the cross-model claim map. Opt in to a per-round Takeaway via the takeaway parameter.

This is the consumer reference. For agent-runtime use (Claude Code, Cursor, etc.), see the MCP server docs.

Quickstart#

Get a key at mumo.chat/settings/api-keys. Keys begin with mmo_live_.
Send a prompt:

curl https://mumo.chat/api/deliberation \
  -H "Authorization: Bearer mmo_live_…" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Should we use Postgres or MongoDB for an event store?",
    "rounds": 1
  }'

You'll get back a session object. With rounds: 1 the status is ready immediately and the round's full artifact stack is in the response. Here's a redacted example:

{
  "id": "f5731fe5-27ce-4f82-b8ce-868e72ff8bb9",
  "status": "ready",
  "mode": "remote",
  "active_models": ["claude-opus-4-6", "gpt-5.4", "grok-4-20-reasoning"],
  "rounds": [
    {
      "index": 0,
      "completion_state": "complete",
      "responses": [
        { "model": "claude-opus-4-6", "text": "...", "snippets": [...] },
        { "model": "gpt-5.4",        "text": "...", "snippets": [...] },
        { "model": "grok-4-20-reasoning", "text": "...", "snippets": [...] }
      ],
      "claim_map": {
        "claims": [
          {
            "quote": "Postgres' append-only WAL plus JSONB columns gives you...",
            "originator": "claude-opus-4-6",
            "reaction_count": 2,
            "positions": [
              { "model": "gpt-5.4",        "type": "KEEP",  "comment": "Agreed — JSONB lets you..." },
              { "model": "grok-4-20-reasoning", "type": "CHALLENGE", "comment": "But what about field-level migrations 18 months in?" }
            ]
          }
        ]
      },
    }
  ],
  "summary": null,
  "confidence_disclaimer": "Confidence scores (0–1) on claims and snippet comments are self-reported..."
}

Read the artifact stack section below for what each piece is for.

Authentication#

Authorization: Bearer mmo_live_…

Keys are minted at /settings/api-keys. Each key is hashed at rest; copy the secret on creation — it's never shown again.

Calls without a valid key return 401 Unauthorized. Registered callers can select any model in the registry — runtime success is governed by credit balance, not tier. Anonymous callers see only tier-0 models in GET /api/models.

Credit wallet#

Every billable LLM call debits a dollar-denominated wallet. A call that would put the caller below the per-model minimum is rejected pre-flight with 403 credit_exhausted — see Errors.

Three buckets, FIFO debit: free → subscription → refill. The free bucket resets monthly on the 1st (UTC) to a platform-configured amount. Subscription and refill buckets are populated through paid flows (Stripe stubs present; paid tier not yet live).

All user-visible USD amounts across the API are markup-included. Wallet balance, per-round debits, session spend totals — every dollar figure returned to consumers reflects what the user actually paid. Raw provider cost is a platform-internal accounting dimension and is not part of the consumer contract.

Where wallet state surfaces on responses:

Write-op errors (credit_exhausted) include the balance/minimum details needed to understand why the request was rejected.
GET /api/credit is the canonical wallet resource — bucket breakdown, reset timing, rollover cap, subscription status, auto-refill state, FIFO debit order.
GET /api/sessions/:id round objects carry a debits[] array — per-model transaction IDs + billed amounts — so balance_before − Σ(new rounds' debits) = balance_after reconciles exactly.
GET /api/models per-model available + unavailable_reason + pricing.minimum_usd — use this to preflight which models the caller's current balance can afford.

Balance is not embedded on GET /api/sessions/:id top-level — that endpoint is high-volume during session polling, and wallet state on a read-heavy path creates cache-coherency and semantic-coupling problems. Use GET /api/credit when you need a fresh balance outside the write-op flow.

Session mode: remote#

POST /api/deliberation creates a remote-mode session — you drive each round, steering between them with typed snippets.

MCP note. MCP is agent-moderated only: create_deliberation starts one appendable round, then the agent calls wait_for_round, reads the responses and claim map, and decides whether to call append_round.

Historical: an autonomous (AI-moderated) mode was supported via a moderator_model parameter until 2026-05. It was retired. Requests that send moderator_model will be rejected by the request schema.

You drive the rounds#

The endpoint commits round 1 and returns a 202 ack immediately; model execution runs in the background. Poll the returned progress_url until the round is terminal, read the claim_map, then call POST /api/sessions/:id/rounds with steering snippets to add the next round. Repeat as long as you want — there's no preset cap on remote sessions.

{ "prompt": "..." }

Single-call use case: if you only want one round of three opinions and no follow-up, just don't call append_round. The session ends after round 1.

POST /api/deliberation accepts optional improvement_consent for the new session. This controls whether session data may be used for platform improvement, not billing, credit consumption, model routing, or response visibility.

Omit it to use the account's effective default.
Paid users may send true or false.
Free-tier users are consent-inclusive under accepted terms. Sending false returns 403 consent_exclusion_unavailable.
improvement_consent is session-level. POST /api/sessions/:id/rounds rejects it rather than changing consent mid-session.

Session responses expose the resolved decision:

"improvement_consent": {
  "enabled": true,
  "reason": "free_tier_terms",
  "requested": null,
  "disclosure": null
}

Existing sessions created before this field shipped may report reason: "default_include".

Async by default#

As of 2026-04, POST /api/deliberation and POST /api/sessions/:id/rounds return a 202 Accepted ack in <500ms after the round row commits and budget is debited — they do not wait for model execution. Model work runs in the background; callers poll progress_url for terminal state.

What the ack looks like:

{
  "session_id": "abc-123",
  "round_id": "round-456",
  "round_index": 0,
  "status": "processing",
  "idempotency_key": "auto-gen-uuid-if-omitted",
  "client_request_id": null,
  "poll_after_ms": 5000,
  "progress_url": "/api/sessions/abc-123/progress",
  "progress_version": 0
}

The ack confirms that the round committed and tells the caller where to poll progress. The canonical wallet resource with bucket breakdown and reset timing lives at GET /api/credit. Write operations still perform affordability preflight and return credit_exhausted before committing if the wallet cannot cover the requested models.

Why: a client timeout no longer produces server/client state divergence. If the ack reached you, the round committed; if it didn't, your idempotency key replay resolves the uncertainty.

?wait=true compatibility shim. REST-only. Appending ?wait=true to the create/append URL makes the endpoint block until terminal state or 290s (whichever first), preserving the pre-2026-04 response shape. This is a migration aid, deprecated (see "Deprecation timeline" below). MCP does not use ?wait=true; use the wait_for_round tool after the write-op ack.

Idempotency#

Scope: (account, endpoint, Idempotency-Key). Same key can be used on both create_deliberation and append_round without colliding.
Request fingerprint: the server hashes a canonical subset of your body (prompt + snippets[] order-sensitive + model set sorted + moderator_name + reference + improvement_consent + register + web_search), NFC-normalized. Same key + same semantic body = replay; same key + different semantic body = 409 idempotency_conflict with original_request_fingerprint in the response for caller-side diff.
Auto-generation: MCP always sends one (its adapter derives a stable key from tool+args). REST: if you omit the Idempotency-Key header, the server generates a UUID and echoes it in the ack's idempotency_key field. Persist the echoed key if you want retry safety across transport failures.
Retry during refund: if a round's refund_status='pending' at replay time, the response includes status: "processing_refund" with retry_after_ms instead of the original terminal. Once the refund credits, replays reflect the terminal failure.
TTL: 24h rolling window.
Novel error patterns: unclassified failures (classifier fell through to internal_error) do NOT auto-credit refunds. They write the failure fact and sit refund_status='pending' for admin review. Only canonical codes (see "Failure codes" below) auto-credit.

Refund lifecycle#

When a round fails catastrophically (all providers error), the ledger emits a failure fact + credits back your budget atomically. The rounds[].refund_status lifecycle you'll see on the progress endpoint:

none — happy path, no failure.
pending — worker emitted the failure fact (failure_event_at set); ledger is about to credit.
credited — refund is in; refund_credited_at stamped. Budget restored.
not_applicable — partial-success round (≥1 model returned output); round counted as "delivered."

SLO: p99 latency from failure_event_at to refund_credited_at is < 60s.

Failure codes#

Canonical failure_code values surfaced on the progress endpoint and in refund-conflict responses:

Code	Meaning
`model_provider_rate_limit`	Provider returned 429 / rate-limit
`model_provider_outage`	Provider returned 5xx or was unreachable
`model_provider_oom`	Provider returned out-of-memory
`model_output_malformed`	Response couldn't be parsed
`model_timeout`	Provider call exceeded the deadline
`all_providers_failed`	Every participant failed (composite) — fires refund
`dependency_timeout`	Upstream (e.g. Brave Search) timed out
`dependency_outage`	Upstream dependency 5xx
`dependency_malformed_response`	Upstream returned unparseable data
`stuck_reconciled`	Reconciliation cron detected a stuck round and emitted synthetic failure
`internal_error`	Unclassified — held for admin review, does NOT auto-credit
`test_forced_failure`	Admin-only test hook

Per-model error codes (`failed_models[].error_code` and `/progress` `models[].error_code`)#

Distinct from the round-level failure_code above. Round-level codes describe why the round as a whole failed (typically all-providers composite). Per-model error_code describes why an individual model's call terminated. A round can carry partial_failure completion state with some failed_models[] entries that each have their own error_code, while the round-level failure_code stays null (no refund fires on partial success).

Code	Meaning	Carries `partial_text`?
`provider_error`	Provider returned an error mid- or post-stream (after first byte). Partial text preserved.	Yes
`pre_stream_provider_error`	Provider returned a 4xx/5xx HTTP response before the stream opened. Covers auth/malformed/pre-stream rate-limit; treat as non-transient for retry decisions.	No
`stream_ended_without_final_marker`	Stream yielded ≥1 delta but never emitted a `done` event before EOF. Partial text preserved.	Yes
`internal_deadline_reached`	The 240s in-band deadline fired while the model was still producing output. Partial text preserved when bytes were already flushed.	Maybe
`deadline_expired`	The 1-min sweep cron found a row past its `deadline_at` without terminal stream_status and wrote this terminal. Out-of-band counterpart to `internal_deadline_reached`.	No
`stream_interrupted`	Worker restarted while the row was mid-stream.	Maybe
`max_retries_exceeded`	Pre-first-byte retry cap hit. Provider was unreachable long enough that no bytes ever rendered.	No
`provider_auth_failure`	Provider rejected the request for auth reasons (typically configuration error). Non-transient.	No
`pre_stream_failure`	Error thrown before `provider.stream()` was even called (prompt build, factory, etc.). Non-transient.	No
`rate_limit`	Provider rate-limited (explicit code path; distinct from a generic 429 routed via `pre_stream_provider_error`).	No
`canceled`	User-initiated abort.	No

Retry/abandon classifier (used by MCP wait_for_round's recommended_client_action and a useful default for REST callers too):

Transient — retry-eligible: rate_limit, provider_error, internal_deadline_reached, deadline_expired, stream_interrupted, stream_ended_without_final_marker.
Abandon (or escalate): pre_stream_provider_error, provider_auth_failure, pre_stream_failure, max_retries_exceeded, canceled. These don't usually clear on retry under the same conditions.

Deprecation timeline for `?wait=true`#

Day 0–90 from GA: full support, Deprecation: true + Sunset + Link response headers on every ?wait=true response.
Day 90–120: advisory window; headers remain.
Day 120+: returns 410 Gone with a pointer to the async-polling pattern. Enforced via WAIT_ENFORCE_410 env flag.

Round Takeaway artifacts (opt-in, per-round)#

One optional boolean opts rounds in to per-round Takeaway generation. POST /api/deliberation (the create path) and POST /api/sessions/:id/rounds (append) both accept takeaway.

Field	Accepted on	Default	Notes
`takeaway`	create + append	`false`	Generate a `round_takeaway` artifact when this round completes — a structured per-round summary (`bottom_line` + `items[]` of `{ question, answer, consensus, claim_ids }`). Surfaces on `GET /api/sessions/:id` once written.

Pricing. The Takeaway bills via the standard credit wallet but with 0 bps markup — at-cost passthrough. The per-session breakdown at /settings/sessions surfaces a dedicated line item with the bucket scope and at-cost marker so you can reconcile what was charged.

Artifacts on the session response. When a Takeaway exists, it surfaces on the session response:

rounds[].round_takeaway — the sole per-round summary artifact, for every source (web, x, api, mcp). Populated for any round that opted in via takeaway=true once generation has completed; null otherwise.

Legacy distill removed. The former distill request parameter is no longer accepted and does not appear in responses. Use takeaway for per-round Takeaways.

The session response#

Every GET /api/sessions/:id response carries these top-level fields:

Field	Notes
`id`	Session UUID.
`status`	`streaming` \| `processing` \| `ready` \| `failed`. See "Status flow" below.
`mode`	`remote`. REST metadata describing how the session was created. (Historical sessions may report `autonomous`; that mode was retired in 2026-05.)
`active_models`	Model IDs participating in this session.
`moderator_model`	Null for all new sessions. Historical sessions may carry a value.
`moderator_name`, `application`	Optional identity metadata.
`model_metadata`	`{ [model_id]: { display_name, provider } }`.
`created_at`, `estimated_ready_at`	Timestamps.
`total_usage`	Aggregated `tokens_in` / `tokens_out` across the session.
`total_cost_usd`	Ground-truth ledger cost (USD) for the entire session. Sums every billable bucket: deliberation + moderator + round summary (Takeaway) + snippet extraction + editorial + search. Markup-exclusive — distinct from wallet debits, which are markup-included. `0` for sessions with no ledger rows yet.
`rounds`	Array of round objects (see below).
`summary`	Session-level editorial. Null until generated for multi-round sessions.
`confidence_disclaimer`	Verbatim advisory string. Surface alongside any displayed confidence scores.
`share_url`	Public reader-deck URL (`https://mumo.chat/p/{slug}`) when the session has an active share, else `null`. Read parity for `POST /api/sessions/:id/share` — re-find an existing link without re-sharing.
`share_status`	`shared` (link-access only, not indexed) \| `published` (platform-listable) \| `null` (never shared, or share removed).

Each round in rounds[] carries: id, index, prompt, completion_state, responses, failed_models, in_progress_models, claim_map, claim_map_url, round_takeaway, cost_usd, debits. round_takeaway is the sole per-round summary artifact for all sources; it is null unless the round opted in via takeaway=true (and generation has completed) — see Round Takeaway artifacts.

completion_state (per-round; distinct from session-level status) is 4-way:

Value	Meaning
`complete`	Every target model produced a final response.
`partial_failure`	All target models reached terminal state; at least one final AND at least one errored. Round is usable but degraded.
`failed`	All target models reached terminal state; every one errored, zero finals. Round produced no usable output.
`in_progress`	At least one target model is still `queued`, `streaming`, or expected-but-absent. Round not yet settled — keep polling `/progress`.

responses[] is the success-only collection: each entry has the canonical content plus two fields for downstream branching:

is_partial (boolean) — true when the response is a successful-but-truncated stream (the model produced output and the call reached done, but the provider signaled truncation via finish_reason). Treat the text as a partial answer; consider asking the user whether to extend.
finish_reason (string | null) — provider-native stop reason, surfaced as-is rather than normalized (Anthropic: end_turn / max_tokens / stop_sequence; OpenAI: stop / length / content_filter; Gemini: STOP / MAX_TOKENS / SAFETY). null when the stream did not complete naturally (error, abort, deadline).

failed_models[] is the error-attribution collection. Each entry:

Field	Notes
`model`	Model ID that failed.
`error`	Free-text error description from the row's `error` column. Stable but not safe to pattern-match — switch on `error_code` for branching.
`message`	Human-readable error message.
`error_code`	Canonical STREAM_ERROR_CODES value (`provider_error`, `stream_ended_without_final_marker`, `internal_deadline_reached`, …). `null` on legacy rows. See "Per-model error codes" below.
`partial_text`	Optional. Present when the failed stream emitted bytes before terminating (post-first-byte `provider_error`, `stream_ended_without_final_marker`, `internal_deadline_reached`). Diagnostic value; sometimes usable as a partial answer.
`partial_text_length`	Optional. Character count of `partial_text` when present.

in_progress_models[] is the "still working" collection — present when completion_state === "in_progress". Each entry:

Field	Notes
`model`	Target model ID.
`state`	`queued` (row pre-inserted, provider call not yet started), `streaming` (≥1 delta observed, no terminal yet), or `absent` (rare; backstop window or race against pre-insert).
`deadline_at`	ISO 8601 timestamp at which the sweep cron will write a terminal error if the row hasn't transitioned. `null` on legacy rows.

cost_usd is the per-round counterpart of session-level total_cost_usd — same ledger source, same markup-exclusive semantics. It is useful after a round completes; during an in-flight round it may be 0 or incomplete because ledger rows settle as model/finalizer calls finish. The relationship: sum(rounds[].cost_usd) ≤ total_cost_usd — session-scoped buckets (session title generation, editorial summary) appear in total_cost_usd only.

The debits[] array is one entry per model call, shape:

{
  "transaction_id": "txn_01h9x2p7k...",
  "model": "claude-opus-4-6",
  "amount_usd": 0.11,
  "settled_at": "2026-04-23T21:15:32Z"
}

amount_usd is markup-included (what the user paid). transaction_id is stable per-debit and safe to reference for reconciliation. See Credit wallet for the contract rule that all user-visible USD amounts are markup-included.

The per-round artifacts:

responses[].text — raw prose from each model.
responses[].snippets[] — model-emitted reactions (typed KEEP/CHALLENGE/etc, with verbatim quotes from peers and optional commentary).
claim_map.claims[] — verbatim claims that ≥2 models reacted to, with each reactor's position (type + commentary). The highest-signal artifact for understanding agreement and disagreement. A position can also be the moderator (model: "moderator"): when a later round's moderator snippet reacts to this round's content, that reaction backfills onto the prior round's claim here. A moderator reaction can also surface a claim that no two models cross-reacted on (a moderator-only claim), so don't assume every claim has ≥2 model positions.
claim_map_url — browser URL for this round's claim map (https://mumo.chat/cm/{round_id}). Auth-gated and owner-only: it requires signing in with the mumo account that owns the API key, then opens the claim map inside the full session view (alongside the raw responses with click-through to the inline text). Agents should surface it to the human at the end of their summary so the deliberation can be reviewed directly. Always present.

Agents driving remote-mode deliberations should use claim_map to decide whether to continue or stop, and opt in to takeaway when they want structured per-round summaries.

The session-level summary field carries the final editorial across the whole session (surface, agreed, split, open blocks plus anchor_quote and og_quote). It's only populated for completed multi-round legacy sessions.

Endpoints#

`POST /api/deliberation`#

Create a session. Body:

Field	Type	Notes
`prompt`	string	The question or topic. Required.
`reference`	string	Optional spec, doc, or design injected as shared context.
`models`	string[]	2–3 model IDs. Defaults to platform selection. Call `GET /api/models` to enumerate.
`moderator_name`	string	Display name for the steering identity (≤100 chars). Surfaces in the published transcript.
`application`	string	Display name of your client (≤100 chars). Surfaces in the session info panel.
`takeaway`	boolean	Opt round 0 in to a per-round Takeaway (`round_takeaway`). Default `false`. See Round Takeaway artifacts.
`register`	`"conversational"` \| `"agent"`	Deprecated — ignored. The platform runs a single prompt environment for all sessions. The field is still accepted (and still participates in the idempotency fingerprint) so existing callers' retries stay stable. Omit it in new integrations.
`web_search`	boolean	Set `false` to skip web search entirely (no search gate, no source pack). Default `true` — the platform decides per-prompt whether to search. Disable for prompts grounded purely in supplied reference material.

Returns a session object (see Quickstart for an example shape).

Idempotency: pass Idempotency-Key: <stable-string> to make retries safe. Same key + same body returns the cached response; same key + different body returns 409 idempotency_conflict.

`POST /api/sessions/:id/rounds`#

Append a round to a remote-mode session. Body:

{
  "prompt": "Focus on the pricing mechanism, not positioning.",
  "snippets": [
    {
      "type": "CHALLENGE",
      "quote": "Per-seat pricing assumes teams of >10.",
      "quoted_model": "gpt-5.4",
      "comment": "Most enterprise pilots start at 3–5."
    },
    {
      "type": "KEEP",
      "quote": "Usage-based pricing aligns incentives.",
      "quoted_model": "claude-opus-4-6"
    }
  ],
  "takeaway": false
}

snippets is optional but high-signal — it's how you steer attention round-to-round.

takeaway is an optional boolean (default false) that yields a round_takeaway for this round (see Round Takeaway artifacts).

Snippet types:

KEEP — this point is strong; preserve it
EXPLORE — dig deeper here
CHALLENGE — push back on this claim
CORE — load-bearing; build on it
SHIFT — this reframes the question

Quotes must be verbatim from a prior round's response. quoted_model is the model ID that originated the quote.

Idempotency-Key is strongly recommended on this endpoint. Round-append duplication corrupts deliberation history.

Errors:

409 session_busy — a round is currently streaming or processing. Retry after a short delay with the same Idempotency-Key.
403 credit_exhausted — your wallet balance can't cover the next round's per-model minimum. Body includes effective_balance_usd, free_usd, subscription_usd, refill_usd, per_model_minimum_usd, next_reset_at. Free-tier balance resets on the 1st of each month (UTC).

`GET /api/sessions/:id`#

Fetch the full state of a session — all rounds, responses, snippets, claim maps, and the editorial summary if present.

The response is fresh on every call.

`POST /api/sessions/:id/share`#

Share a session at a public URL — anyone with the link can view it in the read-only reader deck. No request body.

{
  "session_id": "…",
  "status": "shared",
  "slug": "your-session-title-a1b2c3",
  "share_url": "https://mumo.chat/p/your-session-title-a1b2c3",
  "markdown_url": "https://mumo.chat/p/your-session-title-a1b2c3.md",
  "brief_url": "https://mumo.chat/p/your-session-title-a1b2c3.brief.md"
}

markdown_url is the full-transcript machine-readable twin (per-round claim maps included — the review/audit surface); brief_url is the synthesis-only triage tier (~1–2k tokens, predictable cost).

Semantics:

Idempotent — re-calling returns the same slug and URLs.
Snapshot, kept current — the public page is a point-in-time snapshot; a share call made after new rounds were appended refreshes it at the same URL.
Link-access only — shared pages are noindex and unlisted. status stays shared unless the platform has separately elevated the artifact; a share call never changes an elevated status.
Registered accounts only — anonymous callers receive 403 registration_required.
The first share of a long session generates any missing round takeaways before freezing the snapshot (~15–30s worst case). A 500 share_failed whose message says summaries are still generating is transient — retry after a moment.

Errors: 403 registration_required, 404 not_found, 500 share_failed.

`GET /api/sessions/:id/progress`#

Lightweight poll endpoint for round state without fetching the full session body. Two consumers:

Async REST/MCP callers — after a create_deliberation / append_round ack, poll this to learn whether the round reached terminal state (and whether the refund lifecycle moved). The terminal check is moderation_status === "complete" || moderation_status === "failed". Both are end-states; stop polling either way. "failed" can mean an all-models error OR a catastrophic round-level failure (e.g., pre-insert / internal pipeline error) — to distinguish, read failure_code and the per-model state / error_code entries in models[]. Once terminal, fetch the full content via GET /api/sessions/:id, whose round objects carry failed_models[] for end-of-poll attribution.
Real-time UIs — surface per-model state (queued / streaming / final / error / absent) and a heartbeat without re-rendering the whole transcript.

The response is intentionally compact (no joins on responses text, snippets, or claim map):

{
  "session_id": "abc-123",
  "is_ai_moderated": false,
  "auto_moderation_completed_at": null,
  "rounds": [
    {
      "id": "round-456",
      "index": 0,
      "moderation_status": "in_progress",
      "refund_status": "none",
      "failure_code": null,
      "failure_event_at": null,
      "refund_credited_at": null,
      "refund_deadline_at": null,
      "progress_version": 3,
      "models": [
        {
          "model": "claude-opus-4-7",
          "state": "streaming",
          "deadline_at": "2026-05-13T03:02:30Z",
          "error_code": null,
          "expired_at_read": false,
          "provider": "anthropic",
          "inference_provider": "anthropic",
          "partial_text_length": 1840,
          "last_chunk_at": "2026-05-13T03:00:07Z",
          "since_last_chunk_ms": 3000
        }
      ]
    }
  ]
}

Per-model fields:

Field	Notes
`model`	Model ID.
`state`	`final` \| `error` \| `streaming` \| `queued` \| `absent`. Same state machine as `in_progress_models[].state` plus terminal values.
`deadline_at`	ISO 8601 timestamp at which the row will be swept to a terminal error if it hasn't transitioned. `null` on legacy rows.
`error_code`	STREAM_ERROR_CODES value when `state === "error"`. `null` otherwise.
`expired_at_read`	`true` when `state` is `queued` or `streaming` AND `deadline_at` is already past at read time. The sweep cron's ~60s cadence means a row can be expired up to that long before its terminal write lands; this field exposes the derived state immediately so callers can render "this model is past its deadline; result expected within a minute." Always `false` on terminal states.
`provider`	Model family (`anthropic` \| `openai` \| `google` \| `xai` \| `moonshot` \| `zai` \| `alibaba`). `null` when the registry lookup fails.
`inference_provider`	Inference endpoint family — distinct from `provider` when the model routes through a third-party (Kimi via Fireworks, Qwen via Together AI, etc.). Matches `provider` when no cross-provider route is active.
`partial_text_length`	Character count of the response's accumulated text. `null` for `queued` / `absent` rows; `0` for `streaming` rows that haven't flushed yet. Use `null`-vs-`0` to distinguish "not producing yet" from "observed zero-length partial." For terminal `final` / `error` rows, this is the final character count.
`last_chunk_at`	ISO 8601 timestamp of the most recent SDK delta. Updated on the streaming producer's ~2s text flush. `null` for queued, absent, or non-streaming code paths.
`since_last_chunk_ms`	Read-time computed: `now - last_chunk_at`. Only present when `state === "streaming"`; `null` otherwise. Combined with `partial_text_length` this renders as "claude: streaming, 1840 chars, last chunk 3s ago."

ETag / If-None-Match support. Every /progress response carries a weak ETag. Send it back on the next poll via If-None-Match to short-circuit no-change polls with a 304 Not Modified. The validator covers every body-derived signal — round-level state, per-model state, registry attribution, and (when any model is non-terminal) a 5-second wall-clock bucket so since_last_chunk_ms and expired_at_read can't go stale past one bucket boundary. Terminal-only payloads keep their state-based ETag stable across reads, so cache hits on completed rounds are long-lived.

cache-control: no-store on every response — clients shouldn't share-cache, but their own If-None-Match re-poll still works.

`GET /api/sessions`#

List your sessions.

Query	Values
`mode`	`remote` (historical sessions may also report `autonomous`)
`status`	`ready` \| `streaming`
`limit`	1–200 (default 7)
`offset`	pagination

Returns a lightweight list (no response bodies). Use GET /api/sessions/:id for full content.

`GET /api/models`#

List the models available to your account tier. Each entry:

{
  "id": "claude-opus-4-6",
  "provider": "anthropic",
  "display_name": "Claude Opus 4.6",
  "available": true,
  "unavailable_reason": null,
  "min_user_tier": 1,
  "context_window": 200000,
  "max_output_tokens": 16384,
  "pricing": {
    "input_per_million": 15,
    "output_per_million": 75,
    "cached_input_per_million": 1.5,
    "minimum_usd": 0.05,
    "cache_write_per_million": 18.75
  },
  "sort_order": 10
}

available — true if the caller can actually use this model right now. For registered callers this reduces to a credit-balance check (effective_balance_usd > pricing.minimum_usd); for anonymous callers, tier-0 models filter into the response and tier-1+ models are omitted. false means a call that uses this model will fail.
unavailable_reason — "credit_exhausted" when wallet balance is below the model's minimum. null otherwise. Models outside the caller's tier are filtered from the response entirely, not returned with an unavailable_reason.
min_user_tier — visibility tier: 0 = anonymous-visible, 1/2 = registered-visible. Registered users see every tier; anonymous users see only min_user_tier: 0. Informational only — not a runtime gate. Credit balance is the sole runtime constraint for registered users.
pricing.minimum_usd — the credit-gate threshold for this model. Your effective balance must be strictly greater than this value (after FIFO debit of any prior in-flight round) for the request to preflight.
pricing.cache_write_per_million — present only on models that bill prompt-cache writes: Anthropic models and the GPT-5.6 family (both at 1.25× the input rate). When absent, cache writes are not billed for that model.
Per-million pricing fields reflect raw provider cost (platform COGS), not markup-inclusive user-paid amounts. They're informational — for "will this work?" preflight, use available + pricing.minimum_usd. For "what will it cost me?" read the actual debit on the round's debits[].amount_usd after completion.

`GET /api/credit`#

Canonical wallet resource. Returns the caller's full credit state.

{
  "effective_balance_usd": 1.42,
  "buckets": {
    "free": {
      "balance_usd": 1.42,
      "monthly_grant_usd": 1.50,
      "resets_at": "2026-05-01T00:00:00Z"
    },
    "subscription": {
      "balance_usd": 0,
      "rollover_cap_usd": 30.00,
      "subscription_status": null
    },
    "refill": {
      "balance_usd": 0,
      "auto_refill_enabled": false
    }
  },
  "per_model_minimum_usd_default": 0.05,
  "debit_order": ["free", "subscription", "refill"]
}

effective_balance_usd — sum across all three buckets. The number to compare against per-model minimums for affordability preflight. Markup-included — reflects what you have left to spend, not raw LLM-cost headroom.
buckets.free — monthly free-tier credit. monthly_grant_usd is the platform grant each cycle; resets_at is the next UTC 1st-of-month boundary when the bucket refills.
buckets.subscription — Stripe-granted credit (paid tier). rollover_cap_usd is the max unused balance that carries forward into a new cycle. subscription_status is one of "active" | "past_due" | "cancelled" | "expired" | null (null = no subscription).
buckets.refill — auto-refill top-ups. auto_refill_enabled is the user's current setting. When enabled, two additional fields appear: auto_refill_threshold_usd (trigger level) and auto_refill_amount_usd (top-up size). The fields are absent when autorefill is off.
per_model_minimum_usd_default — platform fallback minimum used when a model's registry row has no explicit pricing_minimum_usd. For per-model values, read pricing.minimum_usd from GET /api/models.
debit_order — FIFO debit sequence. Settlement drains each bucket in this order; surfaced so dashboards and reconciliation tooling don't need to read server code.

Anonymous callers receive the same shape with all balances at 0, subscription_status: null, and auto_refill_enabled: false. Anonymous usage is gated by guest-round budget elsewhere, not the credit wallet.

`GET /api/defaults`#

Discover platform defaults. Wallet state is not here — use GET /api/credit.

{
  "models": ["…", "…", "…"],
  "daily_budget": { "limit": 200, "used": 4, "resets_at": "…" }
}

`GET /api/health`#

Returns { "status": "ok", "version": "v1" }. No auth required.

Status flow#

Two distinct state machines surface in the response.

Session-level status — high-level rollup; what to switch on in REST polling loops:

streaming  →  processing  →  ready
                            ↘
                             failed

streaming — at least one model in the most recent round is actively responding.
processing — all model responses landed; post-processing (snippet extraction, claim map, optionally the round Takeaway) is in flight.
ready — fully complete; safe to call append_round or read final artifacts.
failed — terminal error.

Per-round completion_state — finer-grained signal that lives on each rounds[] entry; the canonical "is this round usable yet?" check:

in_progress  →  complete | partial_failure | failed

in_progress — ≥1 target model still queued / streaming / absent.
complete — every target model produced a final response.
partial_failure — every target model is terminal; ≥1 succeeded, ≥1 errored. Round is usable.
failed — every target model is terminal; zero succeeded. No usable output.

Note: a session can be status: "processing" (post-execution finalizer running) while its latest round is already completion_state: "complete". The session moves to ready once the post-processing pipeline finishes. For "is the round content available?" — check completion_state. For "is the session settled (including editorial/summary)?" — check status.

Errors#

All non-2xx responses return JSON:

{
  "error": "session_busy",
  "message": "Session has a round in progress — poll and retry with same Idempotency-Key",
  "retryable": true
}

Code	HTTP	Retryable	What to do
`unauthorized`	401	no	Bad or missing bearer token.
`unknown_models`	400	no	One or more model IDs in the request don't exist in the registry. Body includes `unknown_models: string[]` and `models_requested: string[]`. Call `GET /api/models` to enumerate valid IDs.
`ineligible_models`	400	no	One or more model IDs were disabled by the calling account in `/settings/models`. Body includes `ineligible_models: string[]`. Either re-enable them in the dashboard or omit them from your request.
`insufficient_active_models`	400	no	The caller omitted `models` and the curated default panel couldn't produce ≥2 picks against the account's enabled set. Body includes `collapsed_buckets: number[]` (zero-indexed). Enable more models at `/settings/models`.
`credit_exhausted`	403	no	Wallet balance below the request's per-model minimum. Body includes bucket breakdown + reset timing (see below). Top up (paid) or wait for the 1st-of-month free-tier reset.
`forbidden`	403	no	Feature not available on your account.
`not_found`	404	no	Session ID doesn't exist or isn't yours.
`idempotency_conflict`	409	no	Same key reused with a different body. Use a new key.
`session_busy`	409	yes	Another round is in flight. Retry with the same `Idempotency-Key`.
`internal_error`	500	yes	Transient. Retry with the same `Idempotency-Key`.

`credit_exhausted` body#

{
  "error": "credit_exhausted",
  "message": "Wallet balance below per-model minimum.",
  "retryable": false,
  "effective_balance_usd": 0.03,
  "free_usd": 0.03,
  "subscription_usd": 0,
  "refill_usd": 0,
  "per_model_minimum_usd": 0.05,
  "next_reset_at": "2026-05-01T00:00:00Z"
}

Unlike session_busy / internal_error, this is not transient — retrying the same request won't resolve it. Either top up (when paid tier ships) or wait for next_reset_at. The bucket breakdown is included here because an exhausted caller needs to know which bucket is empty and when funds return.

`unknown_models` body#

{
  "error": "unknown_models",
  "message": "One or more model IDs are not in the registry.",
  "retryable": false,
  "unknown_models": ["gpt-typo"],
  "models_requested": ["claude-opus-4-6", "gpt-typo"]
}

Preflight check. Fails before any provider call is made, so no credit is debited.

Confidence scores#

When models emit self-reported confidence (via {{C=0.8}}…{{/C}} tags in their prose, or on snippet commentary), those scores surface on responses:

responses[].claim_confidence: [{ claim_text, confidence_score }] — per-claim scores extracted from prose. Tags are stripped from text before return.
responses[].snippets[].comment_confidence: number | null
confidence_disclaimer: string — verbatim advisory at the top of every session response.

These are self-reported and only meaningful relative to the same model's other claims. They are not calibrated across models. If you display them, surface the disclaimer too.

Naming philosophy#

Field names you see in API requests and responses are the canonical contract — they're the names mumo guarantees to consumers. Internal type names and DB columns may differ; the serializer maps between them. We follow a contract-first principle and one-way mapping (internal → API), with the full mapping documented in docs/CONVENTIONS.md.

One convention worth knowing up front: snippet types (type field) are always UPPERCASE at the API boundary — KEEP, EXPLORE, CHALLENGE, CORE, SHIFT. They're lowercase only in internal storage.

We don't rename API fields casually. Any boundary rename comes with a deprecation period that accepts the old name as alias.

REST API — mumo

mumo REST API

Quickstart#

Authentication#

Credit wallet#

Session mode: remote#

You drive the rounds#

Platform Improvement Consent#

Async by default#

Idempotency#

Refund lifecycle#

Failure codes#

Per-model error codes (failed_models[].error_code and /progress models[].error_code)#

Deprecation timeline for ?wait=true#

Round Takeaway artifacts (opt-in, per-round)#

The session response#

Endpoints#

POST /api/deliberation#

POST /api/sessions/:id/rounds#

GET /api/sessions/:id#

POST /api/sessions/:id/share#

GET /api/sessions/:id/progress#

GET /api/sessions#

GET /api/models#

GET /api/credit#

GET /api/defaults#

GET /api/health#