mumo← Back

REST API — mumo

Contents

mumo REST API

Run structured multi-model deliberations on demand. Send a prompt; get back a session containing every model's response and the cross-model claim map. Opt in to per-round recap and session-level synthesis via the recap_round + recap_session parameters.

This is the consumer reference. For agent-runtime use (Claude Code, Cursor, etc.), see the MCP server docs.


Quickstart#

  1. Get a key at mumo.chat/settings/api-keys. Keys begin with mmo_live_.
  2. Send a prompt:
curl https://mumo.chat/api/deliberation \
  -H "Authorization: Bearer mmo_live_…" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Should we use Postgres or MongoDB for an event store?",
    "rounds": 1
  }'

You'll get back a session object. With rounds: 1 the status is ready immediately and the round's full artifact stack is in the response. Here's a redacted example:

{
  "id": "f5731fe5-27ce-4f82-b8ce-868e72ff8bb9",
  "status": "ready",
  "mode": "remote",
  "active_models": ["claude-opus-4-6", "gpt-5.4", "grok-4-20-reasoning"],
  "rounds": [
    {
      "index": 0,
      "completion_state": "complete",
      "responses": [
        { "model": "claude-opus-4-6", "text": "...", "snippets": [...] },
        { "model": "gpt-5.4",        "text": "...", "snippets": [...] },
        { "model": "grok-4-20-reasoning", "text": "...", "snippets": [...] }
      ],
      "claim_map": {
        "claims": [
          {
            "quote": "Postgres' append-only WAL plus JSONB columns gives you...",
            "originator": "claude-opus-4-6",
            "reaction_count": 2,
            "positions": [
              { "model": "gpt-5.4",        "type": "KEEP",  "comment": "Agreed — JSONB lets you..." },
              { "model": "grok-4-20-reasoning", "type": "CHALLENGE", "comment": "But what about field-level migrations 18 months in?" }
            ]
          }
        ]
      },
    }
  ],
  "summary": null,
  "confidence_disclaimer": "Confidence scores (0–1) on claims and snippet comments are self-reported..."
}

Read the artifact stack section below for what each piece is for.


Authentication#

Authorization: Bearer mmo_live_…

Keys are minted at /settings/api-keys. Each key is hashed at rest; copy the secret on creation — it's never shown again.

Calls without a valid key return 401 Unauthorized. Registered callers can select any model in the registry — runtime success is governed by credit balance, not tier. Anonymous callers see only tier-0 models in GET /api/models.


Credit wallet#

Every billable LLM call debits a dollar-denominated wallet. A call that would put the caller below the per-model minimum is rejected pre-flight with 403 credit_exhausted — see Errors.

Three buckets, FIFO debit: freesubscriptionrefill. The free bucket resets monthly on the 1st (UTC) to a platform-configured amount. Subscription and refill buckets are populated through paid flows (Stripe stubs present; paid tier not yet live).

All user-visible USD amounts across the API are markup-included. Wallet balance, per-round debits, session spend totals — every dollar figure returned to consumers reflects what the user actually paid. Raw provider cost is a platform-internal accounting dimension and is not part of the consumer contract.

Where wallet state surfaces on responses:

  • Write-op errors (credit_exhausted) include the balance/minimum details needed to understand why the request was rejected.
  • GET /api/credit is the canonical wallet resource — bucket breakdown, reset timing, rollover cap, subscription status, auto-refill state, FIFO debit order.
  • GET /api/sessions/:id round objects carry a debits[] array — per-model transaction IDs + billed amounts — so balance_before − Σ(new rounds' debits) = balance_after reconciles exactly.
  • GET /api/models per-model available + unavailable_reason + pricing.minimum_usd — use this to preflight which models the caller's current balance can afford.

Balance is not embedded on GET /api/sessions/:id top-level — that endpoint is high-volume during session polling, and wallet state on a read-heavy path creates cache-coherency and semantic-coupling problems. Use GET /api/credit when you need a fresh balance outside the write-op flow.


Session mode: remote#

POST /api/deliberation creates a remote-mode session — you drive each round, steering between them with typed snippets.

MCP note. MCP is agent-moderated only: create_deliberation starts one appendable round, then the agent calls wait_for_round, reads the responses and claim map, and decides whether to call append_round.

Historical: an autonomous (AI-moderated) mode was supported via a moderator_model parameter until 2026-05. It was retired. Requests that send moderator_model will be rejected by the request schema.

You drive the rounds#

The endpoint commits round 1 and returns a 202 ack immediately; model execution runs in the background. Poll the returned progress_url until the round is terminal, read the claim_map, then call POST /api/sessions/:id/rounds with steering snippets to add the next round. Repeat as long as you want — there's no preset cap on remote sessions.

{ "prompt": "..." }

Single-call use case: if you only want one round of three opinions and no follow-up, just don't call append_round. The session ends after round 1.


POST /api/deliberation accepts optional improvement_consent for the new session. This controls whether session data may be used for platform improvement, not billing, credit consumption, model routing, or response visibility.

  • Omit it to use the account's effective default.
  • Paid users may send true or false.
  • Free-tier users are consent-inclusive under accepted terms. Sending false returns 403 consent_exclusion_unavailable.
  • improvement_consent is session-level. POST /api/sessions/:id/rounds rejects it rather than changing consent mid-session.

Session responses expose the resolved decision:

"improvement_consent": {
  "enabled": true,
  "reason": "free_tier_terms",
  "requested": null,
  "disclosure": null
}

Existing sessions created before this field shipped may report reason: "default_include".


Async by default#

As of 2026-04, POST /api/deliberation and POST /api/sessions/:id/rounds return a 202 Accepted ack in <500ms after the round row commits and budget is debited — they do not wait for model execution. Model work runs in the background; callers poll progress_url for terminal state.

What the ack looks like:

{
  "session_id": "abc-123",
  "round_id": "round-456",
  "round_index": 0,
  "status": "processing",
  "idempotency_key": "auto-gen-uuid-if-omitted",
  "client_request_id": null,
  "poll_after_ms": 5000,
  "progress_url": "/api/sessions/abc-123/progress",
  "progress_version": 0
}

The ack confirms that the round committed and tells the caller where to poll progress. The canonical wallet resource with bucket breakdown and reset timing lives at GET /api/credit. Write operations still perform affordability preflight and return credit_exhausted before committing if the wallet cannot cover the requested models.

Why: a client timeout no longer produces server/client state divergence. If the ack reached you, the round committed; if it didn't, your idempotency key replay resolves the uncertainty.

?wait=true compatibility shim. REST-only. Appending ?wait=true to the create/append URL makes the endpoint block until terminal state or 290s (whichever first), preserving the pre-2026-04 response shape. This is a migration aid, deprecated (see "Deprecation timeline" below). MCP does not use ?wait=true; use the wait_for_round tool after the write-op ack.

Idempotency#

  • Scope: (account, endpoint, Idempotency-Key). Same key can be used on both create_deliberation and append_round without colliding.
  • Request fingerprint: the server hashes a canonical subset of your body (prompt + snippets[] order-sensitive + model set sorted + moderator_name + reference + improvement_consent), NFC-normalized. Same key + same semantic body = replay; same key + different semantic body = 409 idempotency_conflict with original_request_fingerprint in the response for caller-side diff.
  • Auto-generation: MCP always sends one (its adapter derives a stable key from tool+args). REST: if you omit the Idempotency-Key header, the server generates a UUID and echoes it in the ack's idempotency_key field. Persist the echoed key if you want retry safety across transport failures.
  • Retry during refund: if a round's refund_status='pending' at replay time, the response includes status: "processing_refund" with retry_after_ms instead of the original terminal. Once the refund credits, replays reflect the terminal failure.
  • TTL: 24h rolling window.
  • Novel error patterns: unclassified failures (classifier fell through to internal_error) do NOT auto-credit refunds. They write the failure fact and sit refund_status='pending' for admin review. Only canonical codes (see "Failure codes" below) auto-credit.

Refund lifecycle#

When a round fails catastrophically (all providers error), the ledger emits a failure fact + credits back your budget atomically. The rounds[].refund_status lifecycle you'll see on the progress endpoint:

  • none — happy path, no failure.
  • pending — worker emitted the failure fact (failure_event_at set); ledger is about to credit.
  • credited — refund is in; refund_credited_at stamped. Budget restored.
  • not_applicable — partial-success round (≥1 model returned output); round counted as "delivered."

SLO: p99 latency from failure_event_at to refund_credited_at is < 60s.

Failure codes#

Canonical failure_code values surfaced on the progress endpoint and in refund-conflict responses:

CodeMeaning
model_provider_rate_limitProvider returned 429 / rate-limit
model_provider_outageProvider returned 5xx or was unreachable
model_provider_oomProvider returned out-of-memory
model_output_malformedResponse couldn't be parsed
model_timeoutProvider call exceeded the deadline
all_providers_failedEvery participant failed (composite) — fires refund
dependency_timeoutUpstream (e.g. Brave Search) timed out
dependency_outageUpstream dependency 5xx
dependency_malformed_responseUpstream returned unparseable data
stuck_reconciledReconciliation cron detected a stuck round and emitted synthetic failure
internal_errorUnclassified — held for admin review, does NOT auto-credit
test_forced_failureAdmin-only test hook

Per-model error codes (failed_models[].error_code and /progress models[].error_code)#

Distinct from the round-level failure_code above. Round-level codes describe why the round as a whole failed (typically all-providers composite). Per-model error_code describes why an individual model's call terminated. A round can carry partial_failure completion state with some failed_models[] entries that each have their own error_code, while the round-level failure_code stays null (no refund fires on partial success).

CodeMeaningCarries partial_text?
provider_errorProvider returned an error mid- or post-stream (after first byte). Partial text preserved.Yes
pre_stream_provider_errorProvider returned a 4xx/5xx HTTP response before the stream opened. Covers auth/malformed/pre-stream rate-limit; treat as non-transient for retry decisions.No
stream_ended_without_final_markerStream yielded ≥1 delta but never emitted a done event before EOF. Partial text preserved.Yes
internal_deadline_reachedThe 150s in-band deadline fired while the model was still producing output. Partial text preserved when bytes were already flushed.Maybe
deadline_expiredThe 1-min sweep cron found a row past its deadline_at without terminal stream_status and wrote this terminal. Out-of-band counterpart to internal_deadline_reached.No
stream_interruptedWorker restarted while the row was mid-stream.Maybe
max_retries_exceededPre-first-byte retry cap hit. Provider was unreachable long enough that no bytes ever rendered.No
provider_auth_failureProvider rejected the request for auth reasons (typically configuration error). Non-transient.No
pre_stream_failureError thrown before provider.stream() was even called (prompt build, factory, etc.). Non-transient.No
rate_limitProvider rate-limited (explicit code path; distinct from a generic 429 routed via pre_stream_provider_error).No
canceledUser-initiated abort.No

Retry/abandon classifier (used by MCP wait_for_round's recommended_client_action and a useful default for REST callers too):

  • Transient — retry-eligible: rate_limit, provider_error, internal_deadline_reached, deadline_expired, stream_interrupted, stream_ended_without_final_marker.
  • Abandon (or escalate): pre_stream_provider_error, provider_auth_failure, pre_stream_failure, max_retries_exceeded, canceled. These don't usually clear on retry under the same conditions.

Deprecation timeline for ?wait=true#

  • Day 0–90 from GA: full support, Deprecation: true + Sunset + Link response headers on every ?wait=true response.
  • Day 90–120: advisory window; headers remain.
  • Day 120+: returns 410 Gone with a pointer to the async-polling pattern. Enforced via WAIT_ENFORCE_410 env flag.

Recap artifacts (opt-in, per-round)#

Two optional, independent booleans opt rounds in to recap generation. POST /api/deliberation (the create path) accepts only recap_round; POST /api/sessions/:id/rounds (append) accepts both. The asymmetry is deliberate: a session synthesis only carries information beyond a round recap when there are ≥ 2 rounds to synthesize over, so on round 0 the two artifacts would be the same thing in different framing. Accepting recap_session on the create path is rejected with a 400 to surface that intent mismatch — set recap_session=true on a later append call instead.

FieldAccepted onDefaultNotes
recap_roundcreate + appendfalseGenerate a round_recap artifact when this round completes — a structured per-round summary with title, tldr, agenda, and sections. Surfaces on GET /api/sessions/:id once written.
recap_sessionappend onlyfalseGenerate the session-level synthesis (title, tldr, origin, arcs) over the in-flight round-recap set when this round completes. Cascade behavior: triggers round_recap generation for any prior rounds that don't already have one — round recaps are a precursor dependency for session synthesis. The cascade runs at the caller's expense (see pricing below). Setting recap_session implicitly covers recap_round for that round; you don't also need to set recap_round=true.

Pricing. Recap and synthesis bill via the standard credit wallet but with 0 bps markup — at-cost passthrough. A typical 3-round cascade lands around ~$0.04 in Kimi inference cost; the per-session breakdown at /settings/sessions surfaces a dedicated "Recap" line item with the bucket scope and at-cost marker so you can reconcile what was charged.

Artifacts on the session response. When recap or synthesis artifacts exist, they surface on the session response:

  • rounds[].round_recap — populated for any round whose recap_round_requested (or recap_session_requested, via cascade) was true and whose recap generation has completed.
  • session_synthesis — populated when the cascade has produced a session-level synthesis. Until synthesis lands, this field is absent.

Legacy distill. The distill parameter is accepted by the schema for back-compat but no longer triggers any artifact generation — legacy distill is disabled. New sessions should use recap_round / recap_session.


The session response#

Every GET /api/sessions/:id response carries these top-level fields:

FieldNotes
idSession UUID.
statusstreaming | processing | ready | failed. See "Status flow" below.
moderemote. REST metadata describing how the session was created. (Historical sessions may report autonomous; that mode was retired in 2026-05.)
active_modelsModel IDs participating in this session.
moderator_modelNull for all new sessions. Historical sessions may carry a value.
moderator_name, applicationOptional identity metadata.
model_metadata{ [model_id]: { display_name, provider } }.
created_at, estimated_ready_atTimestamps.
total_usageAggregated tokens_in / tokens_out across the session.
total_cost_usdGround-truth ledger cost (USD) for the entire session. Sums every billable bucket: deliberation + moderator + recap (round_recap + session_synthesis) + snippet extraction + editorial + search. Markup-exclusive — distinct from wallet debits, which are markup-included. 0 for sessions with no ledger rows yet.
roundsArray of round objects (see below).
summarySession-level editorial. Null until generated for multi-round sessions.
confidence_disclaimerVerbatim advisory string. Surface alongside any displayed confidence scores.

Each round in rounds[] carries: id, index, prompt, completion_state, responses, failed_models, in_progress_models, claim_map, claim_map_url, round_recap, cost_usd, debits. Pre-cutover sessions also carry the legacy distill field; new sessions do not (legacy distill is disabled — see Recap artifacts above). round_recap is null unless the round opted in via recap_round=true (or was backfilled via the recap_session=true cascade).

completion_state (per-round; distinct from session-level status) is 4-way:

ValueMeaning
completeEvery target model produced a final response.
partial_failureAll target models reached terminal state; at least one final AND at least one errored. Round is usable but degraded.
failedAll target models reached terminal state; every one errored, zero finals. Round produced no usable output.
in_progressAt least one target model is still queued, streaming, or expected-but-absent. Round not yet settled — keep polling /progress.

responses[] is the success-only collection: each entry has the canonical content plus two fields for downstream branching:

  • is_partial (boolean) — true when the response is a successful-but-truncated stream (the model produced output and the call reached done, but the provider signaled truncation via finish_reason). Treat the text as a partial answer; consider asking the user whether to extend.
  • finish_reason (string | null) — provider-native stop reason, surfaced as-is rather than normalized (Anthropic: end_turn / max_tokens / stop_sequence; OpenAI: stop / length / content_filter; Gemini: STOP / MAX_TOKENS / SAFETY). null when the stream did not complete naturally (error, abort, deadline).

failed_models[] is the error-attribution collection. Each entry:

FieldNotes
modelModel ID that failed.
errorFree-text error description from the row's error column. Stable but not safe to pattern-match — switch on error_code for branching.
messageHuman-readable error message.
error_codeCanonical STREAM_ERROR_CODES value (provider_error, stream_ended_without_final_marker, internal_deadline_reached, …). null on legacy rows. See "Per-model error codes" below.
partial_textOptional. Present when the failed stream emitted bytes before terminating (post-first-byte provider_error, stream_ended_without_final_marker, internal_deadline_reached). Diagnostic value; sometimes usable as a partial answer.
partial_text_lengthOptional. Character count of partial_text when present.

in_progress_models[] is the "still working" collection — present when completion_state === "in_progress". Each entry:

FieldNotes
modelTarget model ID.
statequeued (row pre-inserted, provider call not yet started), streaming (≥1 delta observed, no terminal yet), or absent (rare; backstop window or race against pre-insert).
deadline_atISO 8601 timestamp at which the sweep cron will write a terminal error if the row hasn't transitioned. null on legacy rows.

cost_usd is the per-round counterpart of session-level total_cost_usd — same ledger source, same markup-exclusive semantics. It is useful after a round completes; during an in-flight round it may be 0 or incomplete because ledger rows settle as model/finalizer calls finish. The relationship: sum(rounds[].cost_usd) ≤ total_cost_usd — session-scoped buckets (session title generation, editorial summary) appear in total_cost_usd only.

The debits[] array is one entry per model call, shape:

{
  "transaction_id": "txn_01h9x2p7k...",
  "model": "claude-opus-4-6",
  "amount_usd": 0.11,
  "settled_at": "2026-04-23T21:15:32Z"
}

amount_usd is markup-included (what the user paid). transaction_id is stable per-debit and safe to reference for reconciliation. See Credit wallet for the contract rule that all user-visible USD amounts are markup-included.

The per-round artifacts:

  • responses[].text — raw prose from each model.
  • responses[].snippets[] — model-emitted reactions (typed KEEP/CHALLENGE/etc, with verbatim quotes from peers and optional commentary).
  • claim_map.claims[] — verbatim claims that ≥2 models reacted to, with each reactor's position (type + commentary). The highest-signal artifact for understanding agreement and disagreement.
  • claim_map_url — browser URL for this round's claim map (https://mumo.chat/cm/{round_id}). Auth-gated and owner-only: it requires signing in with the mumo account that owns the API key. Agents should surface it to the human at the end of their summary so the deliberation can be reviewed directly. Always present — if the claim-map artifact is still generating, the page shows a self-updating pending state.

Legacy distill field: pre-cutover sessions may carry a distill object with the structured fields key_finding, agreements, disagreements, impactful_quote, open_questions, narrative, and continuation. New sessions do not — legacy distill is disabled (see Recap artifacts section). Agents driving remote-mode deliberations should use claim_map to decide whether to continue or stop, and opt in to recap_round / recap_session when they want structured per-round summaries or a session-level synthesis.

The session-level summary field carries the final editorial across the whole session (surface, agreed, split, open blocks plus anchor_quote and og_quote). It's only populated for completed multi-round legacy sessions. Distill v2 sessions surface the session-level synthesis under the separate session_synthesis field — see Recap artifacts.


Endpoints#

POST /api/deliberation#

Create a session. Body:

FieldTypeNotes
promptstringThe question or topic. Required.
referencestringOptional spec, doc, or design injected as shared context.
modelsstring[]2–3 model IDs. Defaults to platform selection. Call GET /api/models to enumerate.
moderator_namestringDisplay name for the steering identity (≤100 chars). Surfaces in the published transcript.
applicationstringDisplay name of your client (≤100 chars). Surfaces in the session info panel.
recap_roundbooleanOpt round 0 in to per-round recap generation. Default false. recap_session is NOT accepted on this endpoint — synthesis requires ≥ 2 rounds, so it would degenerate to a round-recap-only behavior here. Set recap_session=true on a later POST /api/sessions/:id/rounds call when you want session-level synthesis (the cascade backfills earlier rounds' recaps). See Recap artifacts.

Returns a session object (see Quickstart for an example shape).

Idempotency: pass Idempotency-Key: <stable-string> to make retries safe. Same key + same body returns the cached response; same key + different body returns 409 idempotency_conflict.


POST /api/sessions/:id/rounds#

Append a round to a remote-mode session. Body:

{
  "prompt": "Focus on the pricing mechanism, not positioning.",
  "snippets": [
    {
      "type": "CHALLENGE",
      "quote": "Per-seat pricing assumes teams of >10.",
      "quoted_model": "gpt-5.4",
      "comment": "Most enterprise pilots start at 3–5."
    },
    {
      "type": "KEEP",
      "quote": "Usage-based pricing aligns incentives.",
      "quoted_model": "claude-opus-4-6"
    }
  ],
  "recap_round": false,
  "recap_session": false
}

snippets is optional but high-signal — it's how you steer attention round-to-round.

recap_round and recap_session are optional booleans (both default false). See Recap artifacts for the cascade semantics and pricing.

Snippet types:

  • KEEP — this point is strong; preserve it
  • EXPLORE — dig deeper here
  • CHALLENGE — push back on this claim
  • CORE — load-bearing; build on it
  • SHIFT — this reframes the question

Quotes must be verbatim from a prior round's response. quoted_model is the model ID that originated the quote.

Idempotency-Key is strongly recommended on this endpoint. Round-append duplication corrupts deliberation history.

Errors:

  • 409 session_busy — a round is currently streaming or processing. Retry after a short delay with the same Idempotency-Key.
  • 403 credit_exhausted — your wallet balance can't cover the next round's per-model minimum. Body includes effective_balance_usd, free_usd, subscription_usd, refill_usd, per_model_minimum_usd, next_reset_at. Free-tier balance resets on the 1st of each month (UTC).

GET /api/sessions/:id#

Fetch the full state of a session — all rounds, responses, snippets, claim maps, and the editorial summary if present. Pre-cutover sessions also carry legacy distill objects on rounds (new sessions do not — see Recap artifacts).

The response is fresh on every call.


GET /api/sessions/:id/progress#

Lightweight poll endpoint for round state without fetching the full session body. Two consumers:

  1. Async REST/MCP callers — after a create_deliberation / append_round ack, poll this to learn whether the round reached terminal state (and whether the refund lifecycle moved). The terminal check is moderation_status === "complete" || moderation_status === "failed". Both are end-states; stop polling either way. "failed" can mean an all-models error OR a catastrophic round-level failure (e.g., pre-insert / internal pipeline error) — to distinguish, read failure_code and the per-model state / error_code entries in models[]. Once terminal, fetch the full content via GET /api/sessions/:id, whose round objects carry failed_models[] for end-of-poll attribution.
  2. Real-time UIs — surface per-model state (queued / streaming / final / error / absent) and a heartbeat without re-rendering the whole transcript.

The response is intentionally compact (no joins on responses text, snippets, or claim map):

{
  "session_id": "abc-123",
  "is_ai_moderated": false,
  "auto_moderation_completed_at": null,
  "rounds": [
    {
      "id": "round-456",
      "index": 0,
      "moderation_status": "in_progress",
      "refund_status": "none",
      "failure_code": null,
      "failure_event_at": null,
      "refund_credited_at": null,
      "refund_deadline_at": null,
      "progress_version": 3,
      "models": [
        {
          "model": "claude-opus-4-7",
          "state": "streaming",
          "deadline_at": "2026-05-13T03:02:30Z",
          "error_code": null,
          "expired_at_read": false,
          "provider": "anthropic",
          "inference_provider": "anthropic",
          "partial_text_length": 1840,
          "last_chunk_at": "2026-05-13T03:00:07Z",
          "since_last_chunk_ms": 3000
        }
      ]
    }
  ]
}

Per-model fields:

FieldNotes
modelModel ID.
statefinal | error | streaming | queued | absent. Same state machine as in_progress_models[].state plus terminal values.
deadline_atISO 8601 timestamp at which the row will be swept to a terminal error if it hasn't transitioned. null on legacy rows.
error_codeSTREAM_ERROR_CODES value when state === "error". null otherwise.
expired_at_readtrue when state is queued or streaming AND deadline_at is already past at read time. The sweep cron's ~60s cadence means a row can be expired up to that long before its terminal write lands; this field exposes the derived state immediately so callers can render "this model is past its deadline; result expected within a minute." Always false on terminal states.
providerModel family (anthropic | openai | google | xai | moonshot | zai | alibaba). null when the registry lookup fails.
inference_providerInference endpoint family — distinct from provider when the model routes through a third-party (Kimi via Fireworks, Qwen via Together AI, etc.). Matches provider when no cross-provider route is active.
partial_text_lengthCharacter count of the response's accumulated text. null for queued / absent rows; 0 for streaming rows that haven't flushed yet. Use null-vs-0 to distinguish "not producing yet" from "observed zero-length partial." For terminal final / error rows, this is the final character count.
last_chunk_atISO 8601 timestamp of the most recent SDK delta. Updated on the streaming producer's ~2s text flush. null for queued, absent, or non-streaming code paths.
since_last_chunk_msRead-time computed: now - last_chunk_at. Only present when state === "streaming"; null otherwise. Combined with partial_text_length this renders as "claude: streaming, 1840 chars, last chunk 3s ago."

ETag / If-None-Match support. Every /progress response carries a weak ETag. Send it back on the next poll via If-None-Match to short-circuit no-change polls with a 304 Not Modified. The validator covers every body-derived signal — round-level state, per-model state, registry attribution, and (when any model is non-terminal) a 5-second wall-clock bucket so since_last_chunk_ms and expired_at_read can't go stale past one bucket boundary. Terminal-only payloads keep their state-based ETag stable across reads, so cache hits on completed rounds are long-lived.

cache-control: no-store on every response — clients shouldn't share-cache, but their own If-None-Match re-poll still works.


GET /api/sessions#

List your sessions.

QueryValues
moderemote (historical sessions may also report autonomous)
statusready | streaming
limit1–200 (default 7)
offsetpagination

Returns a lightweight list (no response bodies). Use GET /api/sessions/:id for full content.


GET /api/models#

List the models available to your account tier. Each entry:

{
  "id": "claude-opus-4-6",
  "provider": "anthropic",
  "display_name": "Claude Opus 4.6",
  "available": true,
  "unavailable_reason": null,
  "min_user_tier": 1,
  "context_window": 200000,
  "max_output_tokens": 16384,
  "pricing": {
    "input_per_million": 15,
    "output_per_million": 75,
    "cached_input_per_million": 1.5,
    "minimum_usd": 0.05,
    "cache_write_per_million": 30
  },
  "sort_order": 10
}
  • availabletrue if the caller can actually use this model right now. For registered callers this reduces to a credit-balance check (effective_balance_usd > pricing.minimum_usd); for anonymous callers, tier-0 models filter into the response and tier-1+ models are omitted. false means a call that uses this model will fail.
  • unavailable_reason"credit_exhausted" when wallet balance is below the model's minimum. null otherwise. Models outside the caller's tier are filtered from the response entirely, not returned with an unavailable_reason.
  • min_user_tier — visibility tier: 0 = anonymous-visible, 1/2 = registered-visible. Registered users see every tier; anonymous users see only min_user_tier: 0. Informational only — not a runtime gate. Credit balance is the sole runtime constraint for registered users.
  • pricing.minimum_usd — the credit-gate threshold for this model. Your effective balance must be strictly greater than this value (after FIFO debit of any prior in-flight round) for the request to preflight.
  • Per-million pricing fields reflect raw provider cost (platform COGS), not markup-inclusive user-paid amounts. They're informational — for "will this work?" preflight, use available + pricing.minimum_usd. For "what will it cost me?" read the actual debit on the round's debits[].amount_usd after completion.

GET /api/credit#

Canonical wallet resource. Returns the caller's full credit state.

{
  "effective_balance_usd": 1.42,
  "buckets": {
    "free": {
      "balance_usd": 1.42,
      "monthly_grant_usd": 1.50,
      "resets_at": "2026-05-01T00:00:00Z"
    },
    "subscription": {
      "balance_usd": 0,
      "rollover_cap_usd": 30.00,
      "subscription_status": null
    },
    "refill": {
      "balance_usd": 0,
      "auto_refill_enabled": false
    }
  },
  "per_model_minimum_usd_default": 0.05,
  "debit_order": ["free", "subscription", "refill"]
}
  • effective_balance_usd — sum across all three buckets. The number to compare against per-model minimums for affordability preflight. Markup-included — reflects what you have left to spend, not raw LLM-cost headroom.
  • buckets.free — monthly free-tier credit. monthly_grant_usd is the platform grant each cycle; resets_at is the next UTC 1st-of-month boundary when the bucket refills.
  • buckets.subscription — Stripe-granted credit (paid tier). rollover_cap_usd is the max unused balance that carries forward into a new cycle. subscription_status is one of "active" | "past_due" | "cancelled" | "expired" | null (null = no subscription).
  • buckets.refill — auto-refill top-ups. auto_refill_enabled is the user's current setting. When enabled, two additional fields appear: auto_refill_threshold_usd (trigger level) and auto_refill_amount_usd (top-up size). The fields are absent when autorefill is off.
  • per_model_minimum_usd_default — platform fallback minimum used when a model's registry row has no explicit pricing_minimum_usd. For per-model values, read pricing.minimum_usd from GET /api/models.
  • debit_order — FIFO debit sequence. Settlement drains each bucket in this order; surfaced so dashboards and reconciliation tooling don't need to read server code.

Anonymous callers receive the same shape with all balances at 0, subscription_status: null, and auto_refill_enabled: false. Anonymous usage is gated by guest-round budget elsewhere, not the credit wallet.


GET /api/defaults#

Discover platform defaults. Wallet state is not here — use GET /api/credit.

{
  "models": ["…", "…", "…"],
  "daily_budget": { "limit": 200, "used": 4, "resets_at": "…" }
}

GET /api/health#

Returns { "status": "ok", "version": "v1" }. No auth required.


Status flow#

Two distinct state machines surface in the response.

Session-level status — high-level rollup; what to switch on in REST polling loops:

streaming  →  processing  →  ready
                            ↘
                             failed
  • streaming — at least one model in the most recent round is actively responding.
  • processing — all model responses landed; post-processing (snippet extraction, claim map, optionally recap) is in flight.
  • ready — fully complete; safe to call append_round or read final artifacts.
  • failed — terminal error.

Per-round completion_state — finer-grained signal that lives on each rounds[] entry; the canonical "is this round usable yet?" check:

in_progress  →  complete | partial_failure | failed
  • in_progress — ≥1 target model still queued / streaming / absent.
  • complete — every target model produced a final response.
  • partial_failure — every target model is terminal; ≥1 succeeded, ≥1 errored. Round is usable.
  • failed — every target model is terminal; zero succeeded. No usable output.

Note: a session can be status: "processing" (post-execution finalizer running) while its latest round is already completion_state: "complete". The session moves to ready once the post-processing pipeline finishes. For "is the round content available?" — check completion_state. For "is the session settled (including editorial/summary)?" — check status.


Errors#

All non-2xx responses return JSON:

{
  "error": "session_busy",
  "message": "Session has a round in progress — poll and retry with same Idempotency-Key",
  "retryable": true
}
CodeHTTPRetryableWhat to do
unauthorized401noBad or missing bearer token.
unknown_models400noOne or more model IDs in the request don't exist in the registry. Body includes unknown_models: string[] and models_requested: string[]. Call GET /api/models to enumerate valid IDs.
ineligible_models400noOne or more model IDs were disabled by the calling account in /settings/models. Body includes ineligible_models: string[]. Either re-enable them in the dashboard or omit them from your request.
insufficient_active_models400noThe caller omitted models and the curated default panel couldn't produce ≥2 picks against the account's enabled set. Body includes collapsed_buckets: number[] (zero-indexed). Enable more models at /settings/models.
credit_exhausted403noWallet balance below the request's per-model minimum. Body includes bucket breakdown + reset timing (see below). Top up (paid) or wait for the 1st-of-month free-tier reset.
forbidden403noFeature not available on your account.
not_found404noSession ID doesn't exist or isn't yours.
idempotency_conflict409noSame key reused with a different body. Use a new key.
session_busy409yesAnother round is in flight. Retry with the same Idempotency-Key.
internal_error500yesTransient. Retry with the same Idempotency-Key.

credit_exhausted body#

{
  "error": "credit_exhausted",
  "message": "Wallet balance below per-model minimum.",
  "retryable": false,
  "effective_balance_usd": 0.03,
  "free_usd": 0.03,
  "subscription_usd": 0,
  "refill_usd": 0,
  "per_model_minimum_usd": 0.05,
  "next_reset_at": "2026-05-01T00:00:00Z"
}

Unlike session_busy / internal_error, this is not transient — retrying the same request won't resolve it. Either top up (when paid tier ships) or wait for next_reset_at. The bucket breakdown is included here because an exhausted caller needs to know which bucket is empty and when funds return.

unknown_models body#

{
  "error": "unknown_models",
  "message": "One or more model IDs are not in the registry.",
  "retryable": false,
  "unknown_models": ["gpt-typo"],
  "models_requested": ["claude-opus-4-6", "gpt-typo"]
}

Preflight check. Fails before any provider call is made, so no credit is debited.


Confidence scores#

When models emit self-reported confidence (via {{C=0.8}}…{{/C}} tags in their prose, or on snippet commentary), those scores surface on responses:

  • responses[].claim_confidence: [{ claim_text, confidence_score }] — per-claim scores extracted from prose. Tags are stripped from text before return.
  • responses[].snippets[].comment_confidence: number | null
  • confidence_disclaimer: string — verbatim advisory at the top of every session response.

These are self-reported and only meaningful relative to the same model's other claims. They are not calibrated across models. If you display them, surface the disclaimer too.


Naming philosophy#

Field names you see in API requests and responses are the canonical contract — they're the names mumo guarantees to consumers. Internal type names and DB columns may differ; the serializer maps between them. We follow a contract-first principle and one-way mapping (internal → API), with the full mapping documented in docs/CONVENTIONS.md.

One convention worth knowing up front: snippet types (type field) are always UPPERCASE at the API boundary — KEEP, EXPLORE, CHALLENGE, CORE, SHIFT. They're lowercase only in internal storage.

We don't rename API fields casually. Any boundary rename comes with a deprecation period that accepts the old name as alias.