mumo REST API
Run structured multi-model deliberations on demand. Send a prompt; get back a session containing every model's response and the cross-model claim map. Opt in to per-round recap and session-level synthesis via the recap_round + recap_session parameters.
This is the consumer reference. For agent-runtime use (Claude Code, Cursor, etc.), see the MCP server docs.
Quickstart#
- Get a key at mumo.chat/settings/api-keys. Keys begin with
mmo_live_. - Send a prompt:
curl https://mumo.chat/api/deliberation \
-H "Authorization: Bearer mmo_live_…" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Should we use Postgres or MongoDB for an event store?",
"rounds": 1
}'
You'll get back a session object. With rounds: 1 the status is ready immediately and the round's full artifact stack is in the response. Here's a redacted example:
{
"id": "f5731fe5-27ce-4f82-b8ce-868e72ff8bb9",
"status": "ready",
"mode": "remote",
"active_models": ["claude-opus-4-6", "gpt-5.4", "grok-4-20-reasoning"],
"rounds": [
{
"index": 0,
"completion_state": "complete",
"responses": [
{ "model": "claude-opus-4-6", "text": "...", "snippets": [...] },
{ "model": "gpt-5.4", "text": "...", "snippets": [...] },
{ "model": "grok-4-20-reasoning", "text": "...", "snippets": [...] }
],
"claim_map": {
"claims": [
{
"quote": "Postgres' append-only WAL plus JSONB columns gives you...",
"originator": "claude-opus-4-6",
"reaction_count": 2,
"positions": [
{ "model": "gpt-5.4", "type": "KEEP", "comment": "Agreed — JSONB lets you..." },
{ "model": "grok-4-20-reasoning", "type": "CHALLENGE", "comment": "But what about field-level migrations 18 months in?" }
]
}
]
},
}
],
"summary": null,
"confidence_disclaimer": "Confidence scores (0–1) on claims and snippet comments are self-reported..."
}
Read the artifact stack section below for what each piece is for.
Authentication#
Authorization: Bearer mmo_live_…
Keys are minted at /settings/api-keys. Each key is hashed at rest; copy the secret on creation — it's never shown again.
Calls without a valid key return 401 Unauthorized. Registered callers can select any model in the registry — runtime success is governed by credit balance, not tier. Anonymous callers see only tier-0 models in GET /api/models.
Credit wallet#
Every billable LLM call debits a dollar-denominated wallet. A call that would put the caller below the per-model minimum is rejected pre-flight with 403 credit_exhausted — see Errors.
Three buckets, FIFO debit: free → subscription → refill. The free bucket resets monthly on the 1st (UTC) to a platform-configured amount. Subscription and refill buckets are populated through paid flows (Stripe stubs present; paid tier not yet live).
All user-visible USD amounts across the API are markup-included. Wallet balance, per-round debits, session spend totals — every dollar figure returned to consumers reflects what the user actually paid. Raw provider cost is a platform-internal accounting dimension and is not part of the consumer contract.
Where wallet state surfaces on responses:
- Write-op errors (
credit_exhausted) include the balance/minimum details needed to understand why the request was rejected. GET /api/creditis the canonical wallet resource — bucket breakdown, reset timing, rollover cap, subscription status, auto-refill state, FIFO debit order.GET /api/sessions/:idround objects carry adebits[]array — per-model transaction IDs + billed amounts — sobalance_before − Σ(new rounds' debits) = balance_afterreconciles exactly.GET /api/modelsper-modelavailable+unavailable_reason+pricing.minimum_usd— use this to preflight which models the caller's current balance can afford.
Balance is not embedded on GET /api/sessions/:id top-level — that endpoint is high-volume during session polling, and wallet state on a read-heavy path creates cache-coherency and semantic-coupling problems. Use GET /api/credit when you need a fresh balance outside the write-op flow.
Session mode: remote#
POST /api/deliberation creates a remote-mode session — you drive each round, steering between them with typed snippets.
MCP note. MCP is agent-moderated only:
create_deliberationstarts one appendable round, then the agent callswait_for_round, reads the responses and claim map, and decides whether to callappend_round.
Historical: an autonomous (AI-moderated) mode was supported via a
moderator_modelparameter until 2026-05. It was retired. Requests that sendmoderator_modelwill be rejected by the request schema.
You drive the rounds#
The endpoint commits round 1 and returns a 202 ack immediately; model execution runs in the background. Poll the returned progress_url until the round is terminal, read the claim_map, then call POST /api/sessions/:id/rounds with steering snippets to add the next round. Repeat as long as you want — there's no preset cap on remote sessions.
{ "prompt": "..." }
Single-call use case: if you only want one round of three opinions and no follow-up, just don't call append_round. The session ends after round 1.
Platform Improvement Consent#
POST /api/deliberation accepts optional improvement_consent for the new session. This controls whether session data may be used for platform improvement, not billing, credit consumption, model routing, or response visibility.
- Omit it to use the account's effective default.
- Paid users may send
trueorfalse. - Free-tier users are consent-inclusive under accepted terms. Sending
falsereturns403 consent_exclusion_unavailable. improvement_consentis session-level.POST /api/sessions/:id/roundsrejects it rather than changing consent mid-session.
Session responses expose the resolved decision:
"improvement_consent": {
"enabled": true,
"reason": "free_tier_terms",
"requested": null,
"disclosure": null
}
Existing sessions created before this field shipped may report reason: "default_include".
Async by default#
As of 2026-04, POST /api/deliberation and POST /api/sessions/:id/rounds return a 202 Accepted ack in <500ms after the round row commits and budget is debited — they do not wait for model execution. Model work runs in the background; callers poll progress_url for terminal state.
What the ack looks like:
{
"session_id": "abc-123",
"round_id": "round-456",
"round_index": 0,
"status": "processing",
"idempotency_key": "auto-gen-uuid-if-omitted",
"client_request_id": null,
"poll_after_ms": 5000,
"progress_url": "/api/sessions/abc-123/progress",
"progress_version": 0
}
The ack confirms that the round committed and tells the caller where to poll progress. The canonical wallet resource with bucket breakdown and reset timing lives at GET /api/credit. Write operations still perform affordability preflight and return credit_exhausted before committing if the wallet cannot cover the requested models.
Why: a client timeout no longer produces server/client state divergence. If the ack reached you, the round committed; if it didn't, your idempotency key replay resolves the uncertainty.
?wait=true compatibility shim. REST-only. Appending ?wait=true to the create/append URL makes the endpoint block until terminal state or 290s (whichever first), preserving the pre-2026-04 response shape. This is a migration aid, deprecated (see "Deprecation timeline" below). MCP does not use ?wait=true; use the wait_for_round tool after the write-op ack.
Idempotency#
- Scope:
(account, endpoint, Idempotency-Key). Same key can be used on bothcreate_deliberationandappend_roundwithout colliding. - Request fingerprint: the server hashes a canonical subset of your body (prompt + snippets[] order-sensitive + model set sorted + moderator_name + reference +
improvement_consent), NFC-normalized. Same key + same semantic body = replay; same key + different semantic body =409 idempotency_conflictwithoriginal_request_fingerprintin the response for caller-side diff. - Auto-generation: MCP always sends one (its adapter derives a stable key from tool+args). REST: if you omit the
Idempotency-Keyheader, the server generates a UUID and echoes it in the ack'sidempotency_keyfield. Persist the echoed key if you want retry safety across transport failures. - Retry during refund: if a round's
refund_status='pending'at replay time, the response includesstatus: "processing_refund"withretry_after_msinstead of the original terminal. Once the refund credits, replays reflect the terminal failure. - TTL: 24h rolling window.
- Novel error patterns: unclassified failures (classifier fell through to
internal_error) do NOT auto-credit refunds. They write the failure fact and sitrefund_status='pending'for admin review. Only canonical codes (see "Failure codes" below) auto-credit.
Refund lifecycle#
When a round fails catastrophically (all providers error), the ledger emits a failure fact + credits back your budget atomically. The rounds[].refund_status lifecycle you'll see on the progress endpoint:
none— happy path, no failure.pending— worker emitted the failure fact (failure_event_atset); ledger is about to credit.credited— refund is in;refund_credited_atstamped. Budget restored.not_applicable— partial-success round (≥1 model returned output); round counted as "delivered."
SLO: p99 latency from failure_event_at to refund_credited_at is < 60s.
Failure codes#
Canonical failure_code values surfaced on the progress endpoint and in refund-conflict responses:
| Code | Meaning |
|---|---|
model_provider_rate_limit | Provider returned 429 / rate-limit |
model_provider_outage | Provider returned 5xx or was unreachable |
model_provider_oom | Provider returned out-of-memory |
model_output_malformed | Response couldn't be parsed |
model_timeout | Provider call exceeded the deadline |
all_providers_failed | Every participant failed (composite) — fires refund |
dependency_timeout | Upstream (e.g. Brave Search) timed out |
dependency_outage | Upstream dependency 5xx |
dependency_malformed_response | Upstream returned unparseable data |
stuck_reconciled | Reconciliation cron detected a stuck round and emitted synthetic failure |
internal_error | Unclassified — held for admin review, does NOT auto-credit |
test_forced_failure | Admin-only test hook |
Per-model error codes (failed_models[].error_code and /progress models[].error_code)#
Distinct from the round-level failure_code above. Round-level codes describe why the round as a whole failed (typically all-providers composite). Per-model error_code describes why an individual model's call terminated. A round can carry partial_failure completion state with some failed_models[] entries that each have their own error_code, while the round-level failure_code stays null (no refund fires on partial success).
| Code | Meaning | Carries partial_text? |
|---|---|---|
provider_error | Provider returned an error mid- or post-stream (after first byte). Partial text preserved. | Yes |
pre_stream_provider_error | Provider returned a 4xx/5xx HTTP response before the stream opened. Covers auth/malformed/pre-stream rate-limit; treat as non-transient for retry decisions. | No |
stream_ended_without_final_marker | Stream yielded ≥1 delta but never emitted a done event before EOF. Partial text preserved. | Yes |
internal_deadline_reached | The 150s in-band deadline fired while the model was still producing output. Partial text preserved when bytes were already flushed. | Maybe |
deadline_expired | The 1-min sweep cron found a row past its deadline_at without terminal stream_status and wrote this terminal. Out-of-band counterpart to internal_deadline_reached. | No |
stream_interrupted | Worker restarted while the row was mid-stream. | Maybe |
max_retries_exceeded | Pre-first-byte retry cap hit. Provider was unreachable long enough that no bytes ever rendered. | No |
provider_auth_failure | Provider rejected the request for auth reasons (typically configuration error). Non-transient. | No |
pre_stream_failure | Error thrown before provider.stream() was even called (prompt build, factory, etc.). Non-transient. | No |
rate_limit | Provider rate-limited (explicit code path; distinct from a generic 429 routed via pre_stream_provider_error). | No |
canceled | User-initiated abort. | No |
Retry/abandon classifier (used by MCP wait_for_round's recommended_client_action and a useful default for REST callers too):
- Transient — retry-eligible:
rate_limit,provider_error,internal_deadline_reached,deadline_expired,stream_interrupted,stream_ended_without_final_marker. - Abandon (or escalate):
pre_stream_provider_error,provider_auth_failure,pre_stream_failure,max_retries_exceeded,canceled. These don't usually clear on retry under the same conditions.
Deprecation timeline for ?wait=true#
- Day 0–90 from GA: full support,
Deprecation: true+Sunset+Linkresponse headers on every?wait=trueresponse. - Day 90–120: advisory window; headers remain.
- Day 120+: returns
410 Gonewith a pointer to the async-polling pattern. Enforced viaWAIT_ENFORCE_410env flag.
Recap artifacts (opt-in, per-round)#
Two optional, independent booleans opt rounds in to recap generation. POST /api/deliberation (the create path) accepts only recap_round; POST /api/sessions/:id/rounds (append) accepts both. The asymmetry is deliberate: a session synthesis only carries information beyond a round recap when there are ≥ 2 rounds to synthesize over, so on round 0 the two artifacts would be the same thing in different framing. Accepting recap_session on the create path is rejected with a 400 to surface that intent mismatch — set recap_session=true on a later append call instead.
| Field | Accepted on | Default | Notes |
|---|---|---|---|
recap_round | create + append | false | Generate a round_recap artifact when this round completes — a structured per-round summary with title, tldr, agenda, and sections. Surfaces on GET /api/sessions/:id once written. |
recap_session | append only | false | Generate the session-level synthesis (title, tldr, origin, arcs) over the in-flight round-recap set when this round completes. Cascade behavior: triggers round_recap generation for any prior rounds that don't already have one — round recaps are a precursor dependency for session synthesis. The cascade runs at the caller's expense (see pricing below). Setting recap_session implicitly covers recap_round for that round; you don't also need to set recap_round=true. |
Pricing. Recap and synthesis bill via the standard credit wallet but with 0 bps markup — at-cost passthrough. A typical 3-round cascade lands around ~$0.04 in Kimi inference cost; the per-session breakdown at /settings/sessions surfaces a dedicated "Recap" line item with the bucket scope and at-cost marker so you can reconcile what was charged.
Artifacts on the session response. When recap or synthesis artifacts exist, they surface on the session response:
rounds[].round_recap— populated for any round whoserecap_round_requested(orrecap_session_requested, via cascade) was true and whose recap generation has completed.session_synthesis— populated when the cascade has produced a session-level synthesis. Until synthesis lands, this field is absent.
Legacy distill. The distill parameter is accepted by the schema for back-compat but no longer triggers any artifact generation — legacy distill is disabled. New sessions should use recap_round / recap_session.
The session response#
Every GET /api/sessions/:id response carries these top-level fields:
| Field | Notes |
|---|---|
id | Session UUID. |
status | streaming | processing | ready | failed. See "Status flow" below. |
mode | remote. REST metadata describing how the session was created. (Historical sessions may report autonomous; that mode was retired in 2026-05.) |
active_models | Model IDs participating in this session. |
moderator_model | Null for all new sessions. Historical sessions may carry a value. |
moderator_name, application | Optional identity metadata. |
model_metadata | { [model_id]: { display_name, provider } }. |
created_at, estimated_ready_at | Timestamps. |
total_usage | Aggregated tokens_in / tokens_out across the session. |
total_cost_usd | Ground-truth ledger cost (USD) for the entire session. Sums every billable bucket: deliberation + moderator + recap (round_recap + session_synthesis) + snippet extraction + editorial + search. Markup-exclusive — distinct from wallet debits, which are markup-included. 0 for sessions with no ledger rows yet. |
rounds | Array of round objects (see below). |
summary | Session-level editorial. Null until generated for multi-round sessions. |
confidence_disclaimer | Verbatim advisory string. Surface alongside any displayed confidence scores. |
Each round in rounds[] carries: index, prompt, completion_state, responses, failed_models, in_progress_models, claim_map, round_recap, cost_usd, debits. Pre-cutover sessions also carry the legacy distill field; new sessions do not (legacy distill is disabled — see Recap artifacts above). round_recap is null unless the round opted in via recap_round=true (or was backfilled via the recap_session=true cascade).
completion_state (per-round; distinct from session-level status) is 4-way:
| Value | Meaning |
|---|---|
complete | Every target model produced a final response. |
partial_failure | All target models reached terminal state; at least one final AND at least one errored. Round is usable but degraded. |
failed | All target models reached terminal state; every one errored, zero finals. Round produced no usable output. |
in_progress | At least one target model is still queued, streaming, or expected-but-absent. Round not yet settled — keep polling /progress. |
responses[] is the success-only collection: each entry has the canonical content plus two fields for downstream branching:
is_partial(boolean) —truewhen the response is a successful-but-truncated stream (the model produced output and the call reacheddone, but the provider signaled truncation viafinish_reason). Treat the text as a partial answer; consider asking the user whether to extend.finish_reason(string | null) — provider-native stop reason, surfaced as-is rather than normalized (Anthropic:end_turn/max_tokens/stop_sequence; OpenAI:stop/length/content_filter; Gemini:STOP/MAX_TOKENS/SAFETY).nullwhen the stream did not complete naturally (error, abort, deadline).
failed_models[] is the error-attribution collection. Each entry:
| Field | Notes |
|---|---|
model | Model ID that failed. |
error | Free-text error description from the row's error column. Stable but not safe to pattern-match — switch on error_code for branching. |
message | Human-readable error message. |
error_code | Canonical STREAM_ERROR_CODES value (provider_error, stream_ended_without_final_marker, internal_deadline_reached, …). null on legacy rows. See "Per-model error codes" below. |
partial_text | Optional. Present when the failed stream emitted bytes before terminating (post-first-byte provider_error, stream_ended_without_final_marker, internal_deadline_reached). Diagnostic value; sometimes usable as a partial answer. |
partial_text_length | Optional. Character count of partial_text when present. |
in_progress_models[] is the "still working" collection — present when completion_state === "in_progress". Each entry:
| Field | Notes |
|---|---|
model | Target model ID. |
state | queued (row pre-inserted, provider call not yet started), streaming (≥1 delta observed, no terminal yet), or absent (rare; backstop window or race against pre-insert). |
deadline_at | ISO 8601 timestamp at which the sweep cron will write a terminal error if the row hasn't transitioned. null on legacy rows. |
cost_usd is the per-round counterpart of session-level total_cost_usd — same ledger source, same markup-exclusive semantics. It is useful after a round completes; during an in-flight round it may be 0 or incomplete because ledger rows settle as model/finalizer calls finish. The relationship: sum(rounds[].cost_usd) ≤ total_cost_usd — session-scoped buckets (session title generation, editorial summary) appear in total_cost_usd only.
The debits[] array is one entry per model call, shape:
{
"transaction_id": "txn_01h9x2p7k...",
"model": "claude-opus-4-6",
"amount_usd": 0.11,
"settled_at": "2026-04-23T21:15:32Z"
}
amount_usd is markup-included (what the user paid). transaction_id is stable per-debit and safe to reference for reconciliation. See Credit wallet for the contract rule that all user-visible USD amounts are markup-included.
The per-round artifacts:
responses[].text— raw prose from each model.responses[].snippets[]— model-emitted reactions (typedKEEP/CHALLENGE/etc, with verbatim quotes from peers and optional commentary).claim_map.claims[]— verbatim claims that ≥2 models reacted to, with each reactor'sposition(type + commentary). The highest-signal artifact for understanding agreement and disagreement.
Legacy distill field: pre-cutover sessions may carry a distill object with the structured fields key_finding, agreements, disagreements, impactful_quote, open_questions, narrative, and continuation. New sessions do not — legacy distill is disabled (see Recap artifacts section). Agents driving remote-mode deliberations should use claim_map to decide whether to continue or stop, and opt in to recap_round / recap_session when they want structured per-round summaries or a session-level synthesis.
The session-level summary field carries the final editorial across the whole session (surface, agreed, split, open blocks plus anchor_quote and og_quote). It's only populated for completed multi-round legacy sessions. Distill v2 sessions surface the session-level synthesis under the separate session_synthesis field — see Recap artifacts.
Endpoints#
POST /api/deliberation#
Create a session. Body:
| Field | Type | Notes |
|---|---|---|
prompt | string | The question or topic. Required. |
reference | string | Optional spec, doc, or design injected as shared context. |
models | string[] | 2–3 model IDs. Defaults to platform selection. Call GET /api/models to enumerate. |
moderator_name | string | Display name for the steering identity (≤100 chars). Surfaces in the published transcript. |
application | string | Display name of your client (≤100 chars). Surfaces in the session info panel. |
recap_round | boolean | Opt round 0 in to per-round recap generation. Default false. recap_session is NOT accepted on this endpoint — synthesis requires ≥ 2 rounds, so it would degenerate to a round-recap-only behavior here. Set recap_session=true on a later POST /api/sessions/:id/rounds call when you want session-level synthesis (the cascade backfills earlier rounds' recaps). See Recap artifacts. |
Returns a session object (see Quickstart for an example shape).
Idempotency: pass Idempotency-Key: <stable-string> to make retries safe. Same key + same body returns the cached response; same key + different body returns 409 idempotency_conflict.
POST /api/sessions/:id/rounds#
Append a round to a remote-mode session. Body:
{
"prompt": "Focus on the pricing mechanism, not positioning.",
"snippets": [
{
"type": "CHALLENGE",
"quote": "Per-seat pricing assumes teams of >10.",
"quoted_model": "gpt-5.4",
"comment": "Most enterprise pilots start at 3–5."
},
{
"type": "KEEP",
"quote": "Usage-based pricing aligns incentives.",
"quoted_model": "claude-opus-4-6"
}
],
"recap_round": false,
"recap_session": false
}
snippets is optional but high-signal — it's how you steer attention round-to-round.
recap_round and recap_session are optional booleans (both default false). See Recap artifacts for the cascade semantics and pricing.
Snippet types:
- KEEP — this point is strong; preserve it
- EXPLORE — dig deeper here
- CHALLENGE — push back on this claim
- CORE — load-bearing; build on it
- SHIFT — this reframes the question
Quotes must be verbatim from a prior round's response. quoted_model is the model ID that originated the quote.
Idempotency-Key is strongly recommended on this endpoint. Round-append duplication corrupts deliberation history.
Errors:
409 session_busy— a round is currently streaming or processing. Retry after a short delay with the sameIdempotency-Key.403 credit_exhausted— your wallet balance can't cover the next round's per-model minimum. Body includeseffective_balance_usd,free_usd,subscription_usd,refill_usd,per_model_minimum_usd,next_reset_at. Free-tier balance resets on the 1st of each month (UTC).
GET /api/sessions/:id#
Fetch the full state of a session — all rounds, responses, snippets, claim maps, and the editorial summary if present. Pre-cutover sessions also carry legacy distill objects on rounds (new sessions do not — see Recap artifacts).
The response is fresh on every call.
GET /api/sessions/:id/progress#
Lightweight poll endpoint for round state without fetching the full session body. Two consumers:
- Async REST/MCP callers — after a
create_deliberation/append_roundack, poll this to learn whether the round reached terminal state (and whether the refund lifecycle moved). The terminal check ismoderation_status === "complete" || moderation_status === "failed". Both are end-states; stop polling either way."failed"can mean an all-models error OR a catastrophic round-level failure (e.g., pre-insert / internal pipeline error) — to distinguish, readfailure_codeand the per-modelstate/error_codeentries inmodels[]. Once terminal, fetch the full content viaGET /api/sessions/:id, whose round objects carryfailed_models[]for end-of-poll attribution. - Real-time UIs — surface per-model state (queued / streaming / final / error / absent) and a heartbeat without re-rendering the whole transcript.
The response is intentionally compact (no joins on responses text, snippets, or claim map):
{
"session_id": "abc-123",
"is_ai_moderated": false,
"auto_moderation_completed_at": null,
"rounds": [
{
"id": "round-456",
"index": 0,
"moderation_status": "in_progress",
"refund_status": "none",
"failure_code": null,
"failure_event_at": null,
"refund_credited_at": null,
"refund_deadline_at": null,
"progress_version": 3,
"models": [
{
"model": "claude-opus-4-7",
"state": "streaming",
"deadline_at": "2026-05-13T03:02:30Z",
"error_code": null,
"expired_at_read": false,
"provider": "anthropic",
"inference_provider": "anthropic",
"partial_text_length": 1840,
"last_chunk_at": "2026-05-13T03:00:07Z",
"since_last_chunk_ms": 3000
}
]
}
]
}
Per-model fields:
| Field | Notes |
|---|---|
model | Model ID. |
state | final | error | streaming | queued | absent. Same state machine as in_progress_models[].state plus terminal values. |
deadline_at | ISO 8601 timestamp at which the row will be swept to a terminal error if it hasn't transitioned. null on legacy rows. |
error_code | STREAM_ERROR_CODES value when state === "error". null otherwise. |
expired_at_read | true when state is queued or streaming AND deadline_at is already past at read time. The sweep cron's ~60s cadence means a row can be expired up to that long before its terminal write lands; this field exposes the derived state immediately so callers can render "this model is past its deadline; result expected within a minute." Always false on terminal states. |
provider | Model family (anthropic | openai | google | xai | moonshot | zai | alibaba). null when the registry lookup fails. |
inference_provider | Inference endpoint family — distinct from provider when the model routes through a third-party (Kimi via Fireworks, Qwen via Together AI, etc.). Matches provider when no cross-provider route is active. |
partial_text_length | Character count of the response's accumulated text. null for queued / absent rows; 0 for streaming rows that haven't flushed yet. Use null-vs-0 to distinguish "not producing yet" from "observed zero-length partial." For terminal final / error rows, this is the final character count. |
last_chunk_at | ISO 8601 timestamp of the most recent SDK delta. Updated on the streaming producer's ~2s text flush. null for queued, absent, or non-streaming code paths. |
since_last_chunk_ms | Read-time computed: now - last_chunk_at. Only present when state === "streaming"; null otherwise. Combined with partial_text_length this renders as "claude: streaming, 1840 chars, last chunk 3s ago." |
ETag / If-None-Match support. Every /progress response carries a weak ETag. Send it back on the next poll via If-None-Match to short-circuit no-change polls with a 304 Not Modified. The validator covers every body-derived signal — round-level state, per-model state, registry attribution, and (when any model is non-terminal) a 5-second wall-clock bucket so since_last_chunk_ms and expired_at_read can't go stale past one bucket boundary. Terminal-only payloads keep their state-based ETag stable across reads, so cache hits on completed rounds are long-lived.
cache-control: no-store on every response — clients shouldn't share-cache, but their own If-None-Match re-poll still works.
GET /api/sessions#
List your sessions.
| Query | Values |
|---|---|
mode | remote (historical sessions may also report autonomous) |
status | ready | streaming |
limit | 1–200 (default 7) |
offset | pagination |
Returns a lightweight list (no response bodies). Use GET /api/sessions/:id for full content.
GET /api/models#
List the models available to your account tier. Each entry:
{
"id": "claude-opus-4-6",
"provider": "anthropic",
"display_name": "Claude Opus 4.6",
"available": true,
"unavailable_reason": null,
"min_user_tier": 1,
"context_window": 200000,
"max_output_tokens": 16384,
"pricing": {
"input_per_million": 15,
"output_per_million": 75,
"cached_input_per_million": 1.5,
"minimum_usd": 0.05,
"cache_write_per_million": 30
},
"sort_order": 10
}
available—trueif the caller can actually use this model right now. For registered callers this reduces to a credit-balance check (effective_balance_usd > pricing.minimum_usd); for anonymous callers, tier-0 models filter into the response and tier-1+ models are omitted.falsemeans a call that uses this model will fail.unavailable_reason—"credit_exhausted"when wallet balance is below the model's minimum.nullotherwise. Models outside the caller's tier are filtered from the response entirely, not returned with anunavailable_reason.min_user_tier— visibility tier:0= anonymous-visible,1/2= registered-visible. Registered users see every tier; anonymous users see onlymin_user_tier: 0. Informational only — not a runtime gate. Credit balance is the sole runtime constraint for registered users.pricing.minimum_usd— the credit-gate threshold for this model. Your effective balance must be strictly greater than this value (after FIFO debit of any prior in-flight round) for the request to preflight.- Per-million pricing fields reflect raw provider cost (platform COGS), not markup-inclusive user-paid amounts. They're informational — for "will this work?" preflight, use
available+pricing.minimum_usd. For "what will it cost me?" read the actual debit on the round'sdebits[].amount_usdafter completion.
GET /api/credit#
Canonical wallet resource. Returns the caller's full credit state.
{
"effective_balance_usd": 1.42,
"buckets": {
"free": {
"balance_usd": 1.42,
"monthly_grant_usd": 1.50,
"resets_at": "2026-05-01T00:00:00Z"
},
"subscription": {
"balance_usd": 0,
"rollover_cap_usd": 30.00,
"subscription_status": null
},
"refill": {
"balance_usd": 0,
"auto_refill_enabled": false
}
},
"per_model_minimum_usd_default": 0.05,
"debit_order": ["free", "subscription", "refill"]
}
effective_balance_usd— sum across all three buckets. The number to compare against per-model minimums for affordability preflight. Markup-included — reflects what you have left to spend, not raw LLM-cost headroom.buckets.free— monthly free-tier credit.monthly_grant_usdis the platform grant each cycle;resets_atis the next UTC 1st-of-month boundary when the bucket refills.buckets.subscription— Stripe-granted credit (paid tier).rollover_cap_usdis the max unused balance that carries forward into a new cycle.subscription_statusis one of"active" | "past_due" | "cancelled" | "expired" | null(null = no subscription).buckets.refill— auto-refill top-ups.auto_refill_enabledis the user's current setting. When enabled, two additional fields appear:auto_refill_threshold_usd(trigger level) andauto_refill_amount_usd(top-up size). The fields are absent when autorefill is off.per_model_minimum_usd_default— platform fallback minimum used when a model's registry row has no explicitpricing_minimum_usd. For per-model values, readpricing.minimum_usdfromGET /api/models.debit_order— FIFO debit sequence. Settlement drains each bucket in this order; surfaced so dashboards and reconciliation tooling don't need to read server code.
Anonymous callers receive the same shape with all balances at 0, subscription_status: null, and auto_refill_enabled: false. Anonymous usage is gated by guest-round budget elsewhere, not the credit wallet.
GET /api/defaults#
Discover platform defaults. Wallet state is not here — use GET /api/credit.
{
"models": ["…", "…", "…"],
"daily_budget": { "limit": 200, "used": 4, "resets_at": "…" }
}
GET /api/health#
Returns { "status": "ok", "version": "v1" }. No auth required.
Status flow#
Two distinct state machines surface in the response.
Session-level status — high-level rollup; what to switch on in REST polling loops:
streaming → processing → ready
↘
failed
- streaming — at least one model in the most recent round is actively responding.
- processing — all model responses landed; post-processing (snippet extraction, claim map, optionally recap) is in flight.
- ready — fully complete; safe to call
append_roundor read final artifacts. - failed — terminal error.
Per-round completion_state — finer-grained signal that lives on each rounds[] entry; the canonical "is this round usable yet?" check:
in_progress → complete | partial_failure | failed
- in_progress — ≥1 target model still queued / streaming / absent.
- complete — every target model produced a final response.
- partial_failure — every target model is terminal; ≥1 succeeded, ≥1 errored. Round is usable.
- failed — every target model is terminal; zero succeeded. No usable output.
Note: a session can be status: "processing" (post-execution finalizer running) while its latest round is already completion_state: "complete". The session moves to ready once the post-processing pipeline finishes. For "is the round content available?" — check completion_state. For "is the session settled (including editorial/summary)?" — check status.
Errors#
All non-2xx responses return JSON:
{
"error": "session_busy",
"message": "Session has a round in progress — poll and retry with same Idempotency-Key",
"retryable": true
}
| Code | HTTP | Retryable | What to do |
|---|---|---|---|
unauthorized | 401 | no | Bad or missing bearer token. |
unknown_models | 400 | no | One or more model IDs in the request don't exist in the registry. Body includes unknown_models: string[] and models_requested: string[]. Call GET /api/models to enumerate valid IDs. |
ineligible_models | 400 | no | One or more model IDs were disabled by the calling account in /settings/models. Body includes ineligible_models: string[]. Either re-enable them in the dashboard or omit them from your request. |
insufficient_active_models | 400 | no | The caller omitted models and the curated default panel couldn't produce ≥2 picks against the account's enabled set. Body includes collapsed_buckets: number[] (zero-indexed). Enable more models at /settings/models. |
credit_exhausted | 403 | no | Wallet balance below the request's per-model minimum. Body includes bucket breakdown + reset timing (see below). Top up (paid) or wait for the 1st-of-month free-tier reset. |
forbidden | 403 | no | Feature not available on your account. |
not_found | 404 | no | Session ID doesn't exist or isn't yours. |
idempotency_conflict | 409 | no | Same key reused with a different body. Use a new key. |
session_busy | 409 | yes | Another round is in flight. Retry with the same Idempotency-Key. |
internal_error | 500 | yes | Transient. Retry with the same Idempotency-Key. |
credit_exhausted body#
{
"error": "credit_exhausted",
"message": "Wallet balance below per-model minimum.",
"retryable": false,
"effective_balance_usd": 0.03,
"free_usd": 0.03,
"subscription_usd": 0,
"refill_usd": 0,
"per_model_minimum_usd": 0.05,
"next_reset_at": "2026-05-01T00:00:00Z"
}
Unlike session_busy / internal_error, this is not transient — retrying the same request won't resolve it. Either top up (when paid tier ships) or wait for next_reset_at. The bucket breakdown is included here because an exhausted caller needs to know which bucket is empty and when funds return.
unknown_models body#
{
"error": "unknown_models",
"message": "One or more model IDs are not in the registry.",
"retryable": false,
"unknown_models": ["gpt-typo"],
"models_requested": ["claude-opus-4-6", "gpt-typo"]
}
Preflight check. Fails before any provider call is made, so no credit is debited.
Confidence scores#
When models emit self-reported confidence (via {{C=0.8}}…{{/C}} tags in their prose, or on snippet commentary), those scores surface on responses:
responses[].claim_confidence: [{ claim_text, confidence_score }]— per-claim scores extracted from prose. Tags are stripped fromtextbefore return.responses[].snippets[].comment_confidence: number | nullconfidence_disclaimer: string— verbatim advisory at the top of every session response.
These are self-reported and only meaningful relative to the same model's other claims. They are not calibrated across models. If you display them, surface the disclaimer too.
Naming philosophy#
Field names you see in API requests and responses are the canonical contract — they're the names mumo guarantees to consumers. Internal type names and DB columns may differ; the serializer maps between them. We follow a contract-first principle and one-way mapping (internal → API), with the full mapping documented in docs/CONVENTIONS.md.
One convention worth knowing up front: snippet types (type field) are always UPPERCASE at the API boundary — KEEP, EXPLORE, CHALLENGE, CORE, SHIFT. They're lowercase only in internal storage.
We don't rename API fields casually. Any boundary rename comes with a deprecation period that accepts the old name as alias.