mumo← Back

REST API — mumo

Contents

mumo REST API

Run structured multi-model deliberations on demand. Send a prompt; get back a session containing every model's response and the cross-model claim map. Opt in to per-round recap and session-level synthesis via the recap_round + recap_session parameters.

This is the consumer reference. For agent-runtime use (Claude Code, Cursor, etc.), see the MCP server docs.


Quickstart#

  1. Get a key at mumo.chat/settings/api-keys. Keys begin with mmo_live_.
  2. Send a prompt:
curl https://mumo.chat/api/deliberation \
  -H "Authorization: Bearer mmo_live_…" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Should we use Postgres or MongoDB for an event store?",
    "rounds": 1
  }'

You'll get back a session object. With rounds: 1 the status is ready immediately and the round's full artifact stack is in the response. Here's a redacted example:

{
  "id": "f5731fe5-27ce-4f82-b8ce-868e72ff8bb9",
  "status": "ready",
  "mode": "remote",
  "active_models": ["claude-opus-4-6", "gpt-5.4", "grok-4-20-reasoning"],
  "rounds": [
    {
      "index": 0,
      "completion_state": "complete",
      "responses": [
        { "model": "claude-opus-4-6", "text": "...", "snippets": [...] },
        { "model": "gpt-5.4",        "text": "...", "snippets": [...] },
        { "model": "grok-4-20-reasoning", "text": "...", "snippets": [...] }
      ],
      "claim_map": {
        "claims": [
          {
            "quote": "Postgres' append-only WAL plus JSONB columns gives you...",
            "originator": "claude-opus-4-6",
            "reaction_count": 2,
            "positions": [
              { "model": "gpt-5.4",        "type": "KEEP",  "comment": "Agreed — JSONB lets you..." },
              { "model": "grok-4-20-reasoning", "type": "CHALLENGE", "comment": "But what about field-level migrations 18 months in?" }
            ]
          }
        ]
      },
    }
  ],
  "summary": null,
  "confidence_disclaimer": "Confidence scores (0–1) on claims and snippet comments are self-reported..."
}

Read the artifact stack section below for what each piece is for.


Authentication#

Authorization: Bearer mmo_live_…

Keys are minted at /settings/api-keys. Each key is hashed at rest; copy the secret on creation — it's never shown again.

Calls without a valid key return 401 Unauthorized. Registered callers can select any model in the registry — runtime success is governed by credit balance, not tier. Anonymous callers see only tier-0 models in GET /api/models.


Credit wallet#

Every billable LLM call debits a dollar-denominated wallet. A call that would put the caller below the per-model minimum is rejected pre-flight with 403 credit_exhausted — see Errors.

Three buckets, FIFO debit: freesubscriptionrefill. The free bucket resets monthly on the 1st (UTC) to a platform-configured amount. Subscription and refill buckets are populated through paid flows (Stripe stubs present; paid tier not yet live).

All user-visible USD amounts across the API are markup-included. Wallet balance, per-round debits, session spend totals — every dollar figure returned to consumers reflects what the user actually paid. Raw provider cost is a platform-internal accounting dimension and is not part of the consumer contract.

Where wallet state surfaces on responses:

  • Write-op errors (credit_exhausted) include the balance/minimum details needed to understand why the request was rejected.
  • GET /api/credit is the canonical wallet resource — bucket breakdown, reset timing, rollover cap, subscription status, auto-refill state, FIFO debit order.
  • GET /api/sessions/:id round objects carry a debits[] array — per-model transaction IDs + billed amounts — so balance_before − Σ(new rounds' debits) = balance_after reconciles exactly.
  • GET /api/models per-model available + unavailable_reason + pricing.minimum_usd — use this to preflight which models the caller's current balance can afford.

Balance is not embedded on GET /api/sessions/:id top-level — that endpoint is high-volume during session polling, and wallet state on a read-heavy path creates cache-coherency and semantic-coupling problems. Use GET /api/credit when you need a fresh balance outside the write-op flow.


Session mode: remote#

POST /api/deliberation creates a remote-mode session — you drive each round, steering between them with typed snippets.

MCP note. MCP is agent-moderated only: create_deliberation starts one appendable round, then the agent calls wait_for_round, reads the responses and claim map, and decides whether to call append_round.

Historical: an autonomous (AI-moderated) mode was supported via a moderator_model parameter until 2026-05. It was retired. Requests that send moderator_model will be rejected by the request schema.

You drive the rounds#

The endpoint commits round 1 and returns a 202 ack immediately; model execution runs in the background. Poll the returned progress_url until the round is terminal, read the claim_map, then call POST /api/sessions/:id/rounds with steering snippets to add the next round. Repeat as long as you want — there's no preset cap on remote sessions.

{ "prompt": "..." }

Single-call use case: if you only want one round of three opinions and no follow-up, just don't call append_round. The session ends after round 1.


POST /api/deliberation accepts optional improvement_consent for the new session. This controls whether session data may be used for platform improvement, not billing, credit consumption, model routing, or response visibility.

  • Omit it to use the account's effective default.
  • Paid users may send true or false.
  • Free-tier users are consent-inclusive under accepted terms. Sending false returns 403 consent_exclusion_unavailable.
  • improvement_consent is session-level. POST /api/sessions/:id/rounds rejects it rather than changing consent mid-session.

Session responses expose the resolved decision:

"improvement_consent": {
  "enabled": true,
  "reason": "free_tier_terms",
  "requested": null,
  "disclosure": null
}

Existing sessions created before this field shipped may report reason: "default_include".


Async by default#

As of 2026-04, POST /api/deliberation and POST /api/sessions/:id/rounds return a 202 Accepted ack in <500ms after the round row commits and budget is debited — they do not wait for model execution. Model work runs in the background; callers poll progress_url for terminal state.

What the ack looks like:

{
  "session_id": "abc-123",
  "round_id": "round-456",
  "round_index": 0,
  "status": "processing",
  "idempotency_key": "auto-gen-uuid-if-omitted",
  "client_request_id": null,
  "poll_after_ms": 5000,
  "progress_url": "/api/sessions/abc-123/progress",
  "progress_version": 0
}

The ack confirms that the round committed and tells the caller where to poll progress. The canonical wallet resource with bucket breakdown and reset timing lives at GET /api/credit. Write operations still perform affordability preflight and return credit_exhausted before committing if the wallet cannot cover the requested models.

Why: a client timeout no longer produces server/client state divergence. If the ack reached you, the round committed; if it didn't, your idempotency key replay resolves the uncertainty.

?wait=true compatibility shim. REST-only. Appending ?wait=true to the create/append URL makes the endpoint block until terminal state or 290s (whichever first), preserving the pre-2026-04 response shape. This is a migration aid, deprecated (see "Deprecation timeline" below). MCP does not use ?wait=true; use the wait_for_round tool after the write-op ack.

Idempotency#

  • Scope: (account, endpoint, Idempotency-Key). Same key can be used on both create_deliberation and append_round without colliding.
  • Request fingerprint: the server hashes a canonical subset of your body (prompt + snippets[] order-sensitive + model set sorted + moderator_name + reference + improvement_consent), NFC-normalized. Same key + same semantic body = replay; same key + different semantic body = 409 idempotency_conflict with original_request_fingerprint in the response for caller-side diff.
  • Auto-generation: MCP always sends one (its adapter derives a stable key from tool+args). REST: if you omit the Idempotency-Key header, the server generates a UUID and echoes it in the ack's idempotency_key field. Persist the echoed key if you want retry safety across transport failures.
  • Retry during refund: if a round's refund_status='pending' at replay time, the response includes status: "processing_refund" with retry_after_ms instead of the original terminal. Once the refund credits, replays reflect the terminal failure.
  • TTL: 24h rolling window.
  • Novel error patterns: unclassified failures (classifier fell through to internal_error) do NOT auto-credit refunds. They write the failure fact and sit refund_status='pending' for admin review. Only canonical codes (see "Failure codes" below) auto-credit.

Refund lifecycle#

When a round fails catastrophically (all providers error), the ledger emits a failure fact + credits back your budget atomically. The rounds[].refund_status lifecycle you'll see on the progress endpoint:

  • none — happy path, no failure.
  • pending — worker emitted the failure fact (failure_event_at set); ledger is about to credit.
  • credited — refund is in; refund_credited_at stamped. Budget restored.
  • not_applicable — partial-success round (≥1 model returned output); round counted as "delivered."

SLO: p99 latency from failure_event_at to refund_credited_at is < 60s.

Failure codes#

Canonical failure_code values surfaced on the progress endpoint and in refund-conflict responses:

CodeMeaning
model_provider_rate_limitProvider returned 429 / rate-limit
model_provider_outageProvider returned 5xx or was unreachable
model_provider_oomProvider returned out-of-memory
model_output_malformedResponse couldn't be parsed
model_timeoutProvider call exceeded the deadline
all_providers_failedEvery participant failed (composite) — fires refund
dependency_timeoutUpstream (e.g. Brave Search) timed out
dependency_outageUpstream dependency 5xx
dependency_malformed_responseUpstream returned unparseable data
stuck_reconciledReconciliation cron detected a stuck round and emitted synthetic failure
internal_errorUnclassified — held for admin review, does NOT auto-credit
test_forced_failureAdmin-only test hook

Per-model error codes (failed_models[].error_code and /progress models[].error_code)#

Distinct from the round-level failure_code above. Round-level codes describe why the round as a whole failed (typically all-providers composite). Per-model error_code describes why an individual model's call terminated. A round can carry partial_failure completion state with some failed_models[] entries that each have their own error_code, while the round-level failure_code stays null (no refund fires on partial success).

CodeMeaningCarries partial_text?
provider_errorProvider returned an error mid- or post-stream (after first byte). Partial text preserved.Yes
pre_stream_provider_errorProvider returned a 4xx/5xx HTTP response before the stream opened. Covers auth/malformed/pre-stream rate-limit; treat as non-transient for retry decisions.No
stream_ended_without_final_markerStream yielded ≥1 delta but never emitted a done event before EOF. Partial text preserved.Yes
internal_deadline_reachedThe 150s in-band deadline fired while the model was still producing output. Partial text preserved when bytes were already flushed.Maybe
deadline_expiredThe 1-min sweep cron found a row past its deadline_at without terminal stream_status and wrote this terminal. Out-of-band counterpart to internal_deadline_reached.No
stream_interruptedWorker restarted while the row was mid-stream.Maybe
max_retries_exceededPre-first-byte retry cap hit. Provider was unreachable long enough that no bytes ever rendered.No
provider_auth_failureProvider rejected the request for auth reasons (typically configuration error). Non-transient.No
pre_stream_failureError thrown before provider.stream() was even called (prompt build, factory, etc.). Non-transient.No
rate_limitProvider rate-limited (explicit code path; distinct from a generic 429 routed via pre_stream_provider_error).No
canceledUser-initiated abort.No

Retry/abandon classifier (used by MCP wait_for_round's recommended_client_action and a useful default for REST callers too):

  • Transient — retry-eligible: rate_limit, provider_error, internal_deadline_reached, deadline_expired, stream_interrupted, stream_ended_without_final_marker.
  • Abandon (or escalate): pre_stream_provider_error, provider_auth_failure, pre_stream_failure, max_retries_exceeded, canceled. These don't usually clear on retry under the same conditions.

Deprecation timeline for ?wait=true#

  • Day 0–90 from GA: full support, Deprecation: true + Sunset + Link response headers on every ?wait=true response.
  • Day 90–120: advisory window; headers remain.
  • Day 120+: returns 410 Gone with a pointer to the async-polling pattern. Enforced via WAIT_ENFORCE_410 env flag.

Recap artifacts (opt-in, per-round)#

Two optional, independent booleans opt rounds in to recap generation. POST /api/deliberation (the create path) accepts only recap_round; POST /api/sessions/:id/rounds (append) accepts both. The asymmetry is deliberate: a session synthesis only carries information beyond a round recap when there are ≥ 2 rounds to synthesize over, so on round 0 the two artifacts would be the same thing in different framing. Accepting recap_session on the create path is rejected with a 400 to surface that intent mismatch — set recap_session=true on a later append call instead.

FieldAccepted onDefaultNotes
recap_roundcreate + appendfalseGenerate a round_recap artifact when this round completes — a structured per-round summary with title, tldr, agenda, and sections. Surfaces on GET /api/sessions/:id once written.
recap_sessionappend onlyfalseGenerate the session-level synthesis (title, tldr, origin, arcs) over the in-flight round-recap set when this round completes. Cascade behavior: triggers round_recap generation for any prior rounds that don't already have one — round recaps are a precursor dependency for session synthesis. The cascade runs at the caller's expense (see pricing below). Setting recap_session implicitly covers recap_round for that round; you don't also need to set recap_round=true.

Pricing. Recap and synthesis bill via the standard credit wallet but with 0 bps markup — at-cost passthrough. A typical 3-round cascade lands around ~$0.04 in Kimi inference cost; the per-session breakdown at /settings/sessions surfaces a dedicated "Recap" line item with the bucket scope and at-cost marker so you can reconcile what was charged.

Artifacts on the session response. When recap or synthesis artifacts exist, they surface on the session response:

  • rounds[].round_recap — populated for any round whose recap_round_requested (or recap_session_requested, via cascade) was true and whose recap generation has completed.
  • session_synthesis — populated when the cascade has produced a session-level synthesis. Until synthesis lands, this field is absent.

Legacy distill. The distill parameter is accepted by the schema for back-compat but no longer triggers any artifact generation — legacy distill is disabled. New sessions should use recap_round / recap_session.


The session response#

Every GET /api/sessions/:id response carries these top-level fields:

FieldNotes
idSession UUID.
statusstreaming | processing | ready | failed. See "Status flow" below.
moderemote. REST metadata describing how the session was created. (Historical sessions may report autonomous; that mode was retired in 2026-05.)
active_modelsModel IDs participating in this session.
moderator_modelNull for all new sessions. Historical sessions may carry a value.
moderator_name, applicationOptional identity metadata.
model_metadata{ [model_id]: { display_name, provider } }.
created_at, estimated_ready_atTimestamps.
total_usageAggregated tokens_in / tokens_out across the session.
total_cost_usdGround-truth ledger cost (USD) for the entire session. Sums every billable bucket: deliberation + moderator + recap (round_recap + session_synthesis) + snippet extraction + editorial + search. Markup-exclusive — distinct from wallet debits, which are markup-included. 0 for sessions with no ledger rows yet.
roundsArray of round objects (see below).
summarySession-level editorial. Null until generated for multi-round sessions.
confidence_disclaimerVerbatim advisory string. Surface alongside any displayed confidence scores.

Each round in rounds[] carries: index, prompt, completion_state, responses, failed_models, in_progress_models, claim_map, round_recap, cost_usd, debits. Pre-cutover sessions also carry the legacy distill field; new sessions do not (legacy distill is disabled — see Recap artifacts above). round_recap is null unless the round opted in via recap_round=true (or was backfilled via the recap_session=true cascade).

completion_state (per-round; distinct from session-level status) is 4-way:

ValueMeaning
completeEvery target model produced a final response.
partial_failureAll target models reached terminal state; at least one final AND at least one errored. Round is usable but degraded.
failedAll target models reached terminal state; every one errored, zero finals. Round produced no usable output.
in_progressAt least one target model is still queued, streaming, or expected-but-absent. Round not yet settled — keep polling /progress.

responses[] is the success-only collection: each entry has the canonical content plus two fields for downstream branching:

  • is_partial (boolean) — true when the response is a successful-but-truncated stream (the model produced output and the call reached done, but the provider signaled truncation via finish_reason). Treat the text as a partial answer; consider asking the user whether to extend.
  • finish_reason (string | null) — provider-native stop reason, surfaced as-is rather than normalized (Anthropic: end_turn / max_tokens / stop_sequence; OpenAI: stop / length / content_filter; Gemini: STOP / MAX_TOKENS / SAFETY). null when the stream did not complete naturally (error, abort, deadline).

failed_models[] is the error-attribution collection. Each entry:

FieldNotes
modelModel ID that failed.
errorFree-text error description from the row's error column. Stable but not safe to pattern-match — switch on error_code for branching.
messageHuman-readable error message.
error_codeCanonical STREAM_ERROR_CODES value (provider_error, stream_ended_without_final_marker, internal_deadline_reached, …). null on legacy rows. See "Per-model error codes" below.
partial_textOptional. Present when the failed stream emitted bytes before terminating (post-first-byte provider_error, stream_ended_without_final_marker, internal_deadline_reached). Diagnostic value; sometimes usable as a partial answer.
partial_text_lengthOptional. Character count of partial_text when present.

in_progress_models[] is the "still working" collection — present when completion_state === "in_progress". Each entry:

FieldNotes
modelTarget model ID.
statequeued (row pre-inserted, provider call not yet started), streaming (≥1 delta observed, no terminal yet), or absent (rare; backstop window or race against pre-insert).
deadline_atISO 8601 timestamp at which the sweep cron will write a terminal error if the row hasn't transitioned. null on legacy rows.

cost_usd is the per-round counterpart of session-level total_cost_usd — same ledger source, same markup-exclusive semantics. It is useful after a round completes; during an in-flight round it may be 0 or incomplete because ledger rows settle as model/finalizer calls finish. The relationship: sum(rounds[].cost_usd) ≤ total_cost_usd — session-scoped buckets (session title generation, editorial summary) appear in total_cost_usd only.

The debits[] array is one entry per model call, shape:

{
  "transaction_id": "txn_01h9x2p7k...",
  "model": "claude-opus-4-6",
  "amount_usd": 0.11,
  "settled_at": "2026-04-23T21:15:32Z"
}

amount_usd is markup-included (what the user paid). transaction_id is stable per-debit and safe to reference for reconciliation. See Credit wallet for the contract rule that all user-visible USD amounts are markup-included.

The per-round artifacts:

  • responses[].text — raw prose from each model.
  • responses[].snippets[] — model-emitted reactions (typed KEEP/CHALLENGE/etc, with verbatim quotes from peers and optional commentary).
  • claim_map.claims[] — verbatim claims that ≥2 models reacted to, with each reactor's position (type + commentary). The highest-signal artifact for understanding agreement and disagreement.

Legacy distill field: pre-cutover sessions may carry a distill object with the structured fields key_finding, agreements, disagreements, impactful_quote, open_questions, narrative, and continuation. New sessions do not — legacy distill is disabled (see Recap artifacts section). Agents driving remote-mode deliberations should use claim_map to decide whether to continue or stop, and opt in to recap_round / recap_session when they want structured per-round summaries or a session-level synthesis.

The session-level summary field carries the final editorial across the whole session (surface, agreed, split, open blocks plus anchor_quote and og_quote). It's only populated for completed multi-round legacy sessions. Distill v2 sessions surface the session-level synthesis under the separate session_synthesis field — see Recap artifacts.


Endpoints#

POST /api/deliberation#

Create a session. Body:

FieldTypeNotes
promptstringThe question or topic. Required.
referencestringOptional spec, doc, or design injected as shared context.
modelsstring[]2–3 model IDs. Defaults to platform selection. Call GET /api/models to enumerate.
moderator_namestringDisplay name for the steering identity (≤100 chars). Surfaces in the published transcript.
applicationstringDisplay name of your client (≤100 chars). Surfaces in the session info panel.
recap_roundbooleanOpt round 0 in to per-round recap generation. Default false. recap_session is NOT accepted on this endpoint — synthesis requires ≥ 2 rounds, so it would degenerate to a round-recap-only behavior here. Set recap_session=true on a later POST /api/sessions/:id/rounds call when you want session-level synthesis (the cascade backfills earlier rounds' recaps). See Recap artifacts.

Returns a session object (see Quickstart for an example shape).

Idempotency: pass Idempotency-Key: <stable-string> to make retries safe. Same key + same body returns the cached response; same key + different body returns 409 idempotency_conflict.


POST /api/sessions/:id/rounds#

Append a round to a remote-mode session. Body:

{
  "prompt": "Focus on the pricing mechanism, not positioning.",
  "snippets": [
    {
      "type": "CHALLENGE",
      "quote": "Per-seat pricing assumes teams of >10.",
      "quoted_model": "gpt-5.4",
      "comment": "Most enterprise pilots start at 3–5."
    },
    {
      "type": "KEEP",
      "quote": "Usage-based pricing aligns incentives.",
      "quoted_model": "claude-opus-4-6"
    }
  ],
  "recap_round": false,
  "recap_session": false
}

snippets is optional but high-signal — it's how you steer attention round-to-round.

recap_round and recap_session are optional booleans (both default false). See Recap artifacts for the cascade semantics and pricing.

Snippet types:

  • KEEP — this point is strong; preserve it
  • EXPLORE — dig deeper here
  • CHALLENGE — push back on this claim
  • CORE — load-bearing; build on it
  • SHIFT — this reframes the question

Quotes must be verbatim from a prior round's response. quoted_model is the model ID that originated the quote.

Idempotency-Key is strongly recommended on this endpoint. Round-append duplication corrupts deliberation history.

Errors:

  • 409 session_busy — a round is currently streaming or processing. Retry after a short delay with the same Idempotency-Key.
  • 403 credit_exhausted — your wallet balance can't cover the next round's per-model minimum. Body includes effective_balance_usd, free_usd, subscription_usd, refill_usd, per_model_minimum_usd, next_reset_at. Free-tier balance resets on the 1st of each month (UTC).

GET /api/sessions/:id#

Fetch the full state of a session — all rounds, responses, snippets, claim maps, and the editorial summary if present. Pre-cutover sessions also carry legacy distill objects on rounds (new sessions do not — see Recap artifacts).

The response is fresh on every call.


GET /api/sessions/:id/progress#

Lightweight poll endpoint for round state without fetching the full session body. Two consumers:

  1. Async REST/MCP callers — after a create_deliberation / append_round ack, poll this to learn whether the round reached terminal state (and whether the refund lifecycle moved). The terminal check is moderation_status === "complete" || moderation_status === "failed". Both are end-states; stop polling either way. "failed" can mean an all-models error OR a catastrophic round-level failure (e.g., pre-insert / internal pipeline error) — to distinguish, read failure_code and the per-model state / error_code entries in models[]. Once terminal, fetch the full content via GET /api/sessions/:id, whose round objects carry failed_models[] for end-of-poll attribution.
  2. Real-time UIs — surface per-model state (queued / streaming / final / error / absent) and a heartbeat without re-rendering the whole transcript.

The response is intentionally compact (no joins on responses text, snippets, or claim map):

{
  "session_id": "abc-123",
  "is_ai_moderated": false,
  "auto_moderation_completed_at": null,
  "rounds": [
    {
      "id": "round-456",
      "index": 0,
      "moderation_status": "in_progress",
      "refund_status": "none",
      "failure_code": null,
      "failure_event_at": null,
      "refund_credited_at": null,
      "refund_deadline_at": null,
      "progress_version": 3,
      "models": [
        {
          "model": "claude-opus-4-7",
          "state": "streaming",
          "deadline_at": "2026-05-13T03:02:30Z",
          "error_code": null,
          "expired_at_read": false,
          "provider": "anthropic",
          "inference_provider": "anthropic",
          "partial_text_length": 1840,
          "last_chunk_at": "2026-05-13T03:00:07Z",
          "since_last_chunk_ms": 3000
        }
      ]
    }
  ]
}

Per-model fields:

FieldNotes
modelModel ID.
statefinal | error | streaming | queued | absent. Same state machine as in_progress_models[].state plus terminal values.
deadline_atISO 8601 timestamp at which the row will be swept to a terminal error if it hasn't transitioned. null on legacy rows.
error_codeSTREAM_ERROR_CODES value when state === "error". null otherwise.
expired_at_readtrue when state is queued or streaming AND deadline_at is already past at read time. The sweep cron's ~60s cadence means a row can be expired up to that long before its terminal write lands; this field exposes the derived state immediately so callers can render "this model is past its deadline; result expected within a minute." Always false on terminal states.
providerModel family (anthropic | openai | google | xai | moonshot | zai | alibaba). null when the registry lookup fails.
inference_providerInference endpoint family — distinct from provider when the model routes through a third-party (Kimi via Fireworks, Qwen via Together AI, etc.). Matches provider when no cross-provider route is active.
partial_text_lengthCharacter count of the response's accumulated text. null for queued / absent rows; 0 for streaming rows that haven't flushed yet. Use null-vs-0 to distinguish "not producing yet" from "observed zero-length partial." For terminal final / error rows, this is the final character count.
last_chunk_atISO 8601 timestamp of the most recent SDK delta. Updated on the streaming producer's ~2s text flush. null for queued, absent, or non-streaming code paths.
since_last_chunk_msRead-time computed: now - last_chunk_at. Only present when state === "streaming"; null otherwise. Combined with partial_text_length this renders as "claude: streaming, 1840 chars, last chunk 3s ago."

ETag / If-None-Match support. Every /progress response carries a weak ETag. Send it back on the next poll via If-None-Match to short-circuit no-change polls with a 304 Not Modified. The validator covers every body-derived signal — round-level state, per-model state, registry attribution, and (when any model is non-terminal) a 5-second wall-clock bucket so since_last_chunk_ms and expired_at_read can't go stale past one bucket boundary. Terminal-only payloads keep their state-based ETag stable across reads, so cache hits on completed rounds are long-lived.

cache-control: no-store on every response — clients shouldn't share-cache, but their own If-None-Match re-poll still works.


GET /api/sessions#

List your sessions.

QueryValues
moderemote (historical sessions may also report autonomous)
statusready | streaming
limit1–200 (default 7)
offsetpagination

Returns a lightweight list (no response bodies). Use GET /api/sessions/:id for full content.


GET /api/models#

List the models available to your account tier. Each entry:

{
  "id": "claude-opus-4-6",
  "provider": "anthropic",
  "display_name": "Claude Opus 4.6",
  "available": true,
  "unavailable_reason": null,
  "min_user_tier": 1,
  "context_window": 200000,
  "max_output_tokens": 16384,
  "pricing": {
    "input_per_million": 15,
    "output_per_million": 75,
    "cached_input_per_million": 1.5,
    "minimum_usd": 0.05,
    "cache_write_per_million": 30
  },
  "sort_order": 10
}
  • availabletrue if the caller can actually use this model right now. For registered callers this reduces to a credit-balance check (effective_balance_usd > pricing.minimum_usd); for anonymous callers, tier-0 models filter into the response and tier-1+ models are omitted. false means a call that uses this model will fail.
  • unavailable_reason"credit_exhausted" when wallet balance is below the model's minimum. null otherwise. Models outside the caller's tier are filtered from the response entirely, not returned with an unavailable_reason.
  • min_user_tier — visibility tier: 0 = anonymous-visible, 1/2 = registered-visible. Registered users see every tier; anonymous users see only min_user_tier: 0. Informational only — not a runtime gate. Credit balance is the sole runtime constraint for registered users.
  • pricing.minimum_usd — the credit-gate threshold for this model. Your effective balance must be strictly greater than this value (after FIFO debit of any prior in-flight round) for the request to preflight.
  • Per-million pricing fields reflect raw provider cost (platform COGS), not markup-inclusive user-paid amounts. They're informational — for "will this work?" preflight, use available + pricing.minimum_usd. For "what will it cost me?" read the actual debit on the round's debits[].amount_usd after completion.

GET /api/credit#

Canonical wallet resource. Returns the caller's full credit state.

{
  "effective_balance_usd": 1.42,
  "buckets": {
    "free": {
      "balance_usd": 1.42,
      "monthly_grant_usd": 1.50,
      "resets_at": "2026-05-01T00:00:00Z"
    },
    "subscription": {
      "balance_usd": 0,
      "rollover_cap_usd": 30.00,
      "subscription_status": null
    },
    "refill": {
      "balance_usd": 0,
      "auto_refill_enabled": false
    }
  },
  "per_model_minimum_usd_default": 0.05,
  "debit_order": ["free", "subscription", "refill"]
}
  • effective_balance_usd — sum across all three buckets. The number to compare against per-model minimums for affordability preflight. Markup-included — reflects what you have left to spend, not raw LLM-cost headroom.
  • buckets.free — monthly free-tier credit. monthly_grant_usd is the platform grant each cycle; resets_at is the next UTC 1st-of-month boundary when the bucket refills.
  • buckets.subscription — Stripe-granted credit (paid tier). rollover_cap_usd is the max unused balance that carries forward into a new cycle. subscription_status is one of "active" | "past_due" | "cancelled" | "expired" | null (null = no subscription).
  • buckets.refill — auto-refill top-ups. auto_refill_enabled is the user's current setting. When enabled, two additional fields appear: auto_refill_threshold_usd (trigger level) and auto_refill_amount_usd (top-up size). The fields are absent when autorefill is off.
  • per_model_minimum_usd_default — platform fallback minimum used when a model's registry row has no explicit pricing_minimum_usd. For per-model values, read pricing.minimum_usd from GET /api/models.
  • debit_order — FIFO debit sequence. Settlement drains each bucket in this order; surfaced so dashboards and reconciliation tooling don't need to read server code.

Anonymous callers receive the same shape with all balances at 0, subscription_status: null, and auto_refill_enabled: false. Anonymous usage is gated by guest-round budget elsewhere, not the credit wallet.


GET /api/defaults#

Discover platform defaults. Wallet state is not here — use GET /api/credit.

{
  "models": ["…", "…", "…"],
  "daily_budget": { "limit": 200, "used": 4, "resets_at": "…" }
}

GET /api/health#

Returns { "status": "ok", "version": "v1" }. No auth required.


Status flow#

Two distinct state machines surface in the response.

Session-level status — high-level rollup; what to switch on in REST polling loops:

streaming  →  processing  →  ready
                            ↘
                             failed
  • streaming — at least one model in the most recent round is actively responding.
  • processing — all model responses landed; post-processing (snippet extraction, claim map, optionally recap) is in flight.
  • ready — fully complete; safe to call append_round or read final artifacts.
  • failed — terminal error.

Per-round completion_state — finer-grained signal that lives on each rounds[] entry; the canonical "is this round usable yet?" check:

in_progress  →  complete | partial_failure | failed
  • in_progress — ≥1 target model still queued / streaming / absent.
  • complete — every target model produced a final response.
  • partial_failure — every target model is terminal; ≥1 succeeded, ≥1 errored. Round is usable.
  • failed — every target model is terminal; zero succeeded. No usable output.

Note: a session can be status: "processing" (post-execution finalizer running) while its latest round is already completion_state: "complete". The session moves to ready once the post-processing pipeline finishes. For "is the round content available?" — check completion_state. For "is the session settled (including editorial/summary)?" — check status.


Errors#

All non-2xx responses return JSON:

{
  "error": "session_busy",
  "message": "Session has a round in progress — poll and retry with same Idempotency-Key",
  "retryable": true
}
CodeHTTPRetryableWhat to do
unauthorized401noBad or missing bearer token.
unknown_models400noOne or more model IDs in the request don't exist in the registry. Body includes unknown_models: string[] and models_requested: string[]. Call GET /api/models to enumerate valid IDs.
ineligible_models400noOne or more model IDs were disabled by the calling account in /settings/models. Body includes ineligible_models: string[]. Either re-enable them in the dashboard or omit them from your request.
insufficient_active_models400noThe caller omitted models and the curated default panel couldn't produce ≥2 picks against the account's enabled set. Body includes collapsed_buckets: number[] (zero-indexed). Enable more models at /settings/models.
credit_exhausted403noWallet balance below the request's per-model minimum. Body includes bucket breakdown + reset timing (see below). Top up (paid) or wait for the 1st-of-month free-tier reset.
forbidden403noFeature not available on your account.
not_found404noSession ID doesn't exist or isn't yours.
idempotency_conflict409noSame key reused with a different body. Use a new key.
session_busy409yesAnother round is in flight. Retry with the same Idempotency-Key.
internal_error500yesTransient. Retry with the same Idempotency-Key.

credit_exhausted body#

{
  "error": "credit_exhausted",
  "message": "Wallet balance below per-model minimum.",
  "retryable": false,
  "effective_balance_usd": 0.03,
  "free_usd": 0.03,
  "subscription_usd": 0,
  "refill_usd": 0,
  "per_model_minimum_usd": 0.05,
  "next_reset_at": "2026-05-01T00:00:00Z"
}

Unlike session_busy / internal_error, this is not transient — retrying the same request won't resolve it. Either top up (when paid tier ships) or wait for next_reset_at. The bucket breakdown is included here because an exhausted caller needs to know which bucket is empty and when funds return.

unknown_models body#

{
  "error": "unknown_models",
  "message": "One or more model IDs are not in the registry.",
  "retryable": false,
  "unknown_models": ["gpt-typo"],
  "models_requested": ["claude-opus-4-6", "gpt-typo"]
}

Preflight check. Fails before any provider call is made, so no credit is debited.


Confidence scores#

When models emit self-reported confidence (via {{C=0.8}}…{{/C}} tags in their prose, or on snippet commentary), those scores surface on responses:

  • responses[].claim_confidence: [{ claim_text, confidence_score }] — per-claim scores extracted from prose. Tags are stripped from text before return.
  • responses[].snippets[].comment_confidence: number | null
  • confidence_disclaimer: string — verbatim advisory at the top of every session response.

These are self-reported and only meaningful relative to the same model's other claims. They are not calibrated across models. If you display them, surface the disclaimer too.


Naming philosophy#

Field names you see in API requests and responses are the canonical contract — they're the names mumo guarantees to consumers. Internal type names and DB columns may differ; the serializer maps between them. We follow a contract-first principle and one-way mapping (internal → API), with the full mapping documented in docs/CONVENTIONS.md.

One convention worth knowing up front: snippet types (type field) are always UPPERCASE at the API boundary — KEEP, EXPLORE, CHALLENGE, CORE, SHIFT. They're lowercase only in internal storage.

We don't rename API fields casually. Any boundary rename comes with a deprecation period that accepts the old name as alias.