mumo← Back

mumo REST API

Run structured multi-model deliberations on demand. Send a prompt; get back a session containing every model's response, the cross-model claim map, and the round-level distill.

This is the consumer reference. For agent-runtime use (Claude Code, Cursor, etc.), see the MCP server docs.


Async by default (Trello #93)

As of 2026-04, POST /api/deliberation and POST /api/sessions/:id/rounds return a 202 Accepted ack in <500ms after the round row commits and budget is debited — they do not wait for model execution. Model work runs in the background; callers poll progress_url for terminal state.

What the ack looks like:

{
  "session_id": "abc-123",
  "round_index": 0,
  "status": "processing",
  "idempotency_key": "auto-gen-uuid-if-omitted",
  "client_request_id": null,
  "poll_after_ms": 5000,
  "progress_url": "/api/sessions/abc-123/progress",
  "progress_version": 0
}

Why: a client timeout no longer produces server/client state divergence. If the ack reached you, the round committed; if it didn't, your idempotency key replay resolves the uncertainty.

?wait=true compatibility shim. REST-only. Appending ?wait=true to the create/append URL makes the endpoint block until terminal state or 290s (whichever first), preserving the pre-2026-04 response shape. This is a migration aid, deprecated (see "Deprecation timeline" below). MCP ignores ?wait=true; autonomous mode (with moderator_model) rejects it with 400 wait_unsupported because orchestration loops span multiple rounds.

Idempotency

  • Scope: (account, endpoint, Idempotency-Key). Same key can be used on both create_deliberation and append_round without colliding.
  • Request fingerprint: the server hashes a canonical subset of your body (prompt + snippets[] order-sensitive + model set sorted + session_mode + moderator_name + reference + rounds + moderator_model), NFC-normalized. Same key + same semantic body = replay; same key + different semantic body = 409 idempotency_conflict with original_request_fingerprint in the response for caller-side diff.
  • Auto-generation: MCP always sends one (its adapter derives a stable key from tool+args). REST: if you omit the Idempotency-Key header, the server generates a UUID and echoes it in the ack's idempotency_key field. Persist the echoed key if you want retry safety across transport failures.
  • Retry during refund: if a round's refund_status='pending' at replay time, the response includes status: "processing_refund" with retry_after_ms instead of the original terminal. Once the refund credits, replays reflect the terminal failure.
  • TTL: 24h rolling window.
  • Novel error patterns: unclassified failures (classifier fell through to internal_error) do NOT auto-credit refunds. They write the failure fact and sit refund_status='pending' for admin review. Only canonical codes (see "Failure codes" below) auto-credit.

Refund lifecycle

When a round fails catastrophically (all providers error), the ledger emits a failure fact + credits back your budget atomically. The rounds[].refund_status lifecycle you'll see on the progress endpoint:

  • none — happy path, no failure.
  • pending — worker emitted the failure fact (failure_event_at set); ledger is about to credit.
  • credited — refund is in; refund_credited_at stamped. Budget restored.
  • not_applicable — partial-success round (≥1 model returned output); round counted as "delivered."

SLO: p99 latency from failure_event_at to refund_credited_at is < 60s.

Failure codes

Canonical failure_code values surfaced on the progress endpoint and in refund-conflict responses:

CodeMeaning
model_provider_rate_limitProvider returned 429 / rate-limit
model_provider_outageProvider returned 5xx or was unreachable
model_provider_oomProvider returned out-of-memory
model_output_malformedResponse couldn't be parsed
model_timeoutProvider call exceeded the deadline
all_providers_failedEvery participant failed (composite) — fires refund
dependency_timeoutUpstream (e.g. Brave Search) timed out
dependency_outageUpstream dependency 5xx
dependency_malformed_responseUpstream returned unparseable data
stuck_reconciledReconciliation cron detected a stuck round and emitted synthetic failure
internal_errorUnclassified — held for admin review, does NOT auto-credit
test_forced_failureAdmin-only test hook

Deprecation timeline for ?wait=true

  • Day 0–90 from GA: full support, Deprecation: true + Sunset + Link response headers on every ?wait=true response.
  • Day 90–120: advisory window; headers remain.
  • Day 120+: returns 410 Gone with a pointer to the async-polling pattern. Enforced via WAIT_ENFORCE_410 env flag.

Quickstart

  1. Get a key at mumo.chat/settings/api-keys. Keys begin with mmo_live_.
  2. Send a prompt:
curl https://mumo.chat/api/deliberation \
  -H "Authorization: Bearer mmo_live_…" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Should we use Postgres or MongoDB for an event store?",
    "rounds": 1
  }'

You'll get back a session object. With rounds: 1 (single-shot) the status is ready immediately and the round's full artifact stack is in the response. Here's a redacted example:

{
  "id": "f5731fe5-27ce-4f82-b8ce-868e72ff8bb9",
  "status": "ready",
  "mode": "single_shot",
  "active_models": ["claude-opus-4-6", "gpt-5.4", "grok-4-20-reasoning"],
  "rounds": [
    {
      "index": 0,
      "completion_state": "complete",
      "responses": [
        { "model": "claude-opus-4-6", "text": "...", "snippets": [...] },
        { "model": "gpt-5.4",        "text": "...", "snippets": [...] },
        { "model": "grok-4-20-reasoning", "text": "...", "snippets": [...] }
      ],
      "claim_map": {
        "claims": [
          {
            "quote": "Postgres' append-only WAL plus JSONB columns gives you...",
            "originator": "claude-opus-4-6",
            "reaction_count": 2,
            "positions": [
              { "model": "gpt-5.4",        "type": "KEEP",  "comment": "Agreed — JSONB lets you..." },
              { "model": "grok-4-20-reasoning", "type": "CHALLENGE", "comment": "But what about field-level migrations 18 months in?" }
            ]
          }
        ]
      },
      "distill": {
        "key_finding": "All three converged on Postgres + JSONB for the event store, but split on schema rigidity.",
        "agreements": ["Append-only access pattern fits Postgres' WAL.", "JSONB lets you start schemaless and tighten later."],
        "disagreements": ["GPT prefers schema-on-write at the application layer; Grok argues schemaless is fine because the schema is the event type."],
        "impactful_quote": {
          "text": "Postgres' append-only WAL plus JSONB columns gives you...",
          "model": "claude-opus-4-6",
          "why": "Reframed the choice from 'doc store vs relational' to 'use the right Postgres feature for the access pattern.'"
        },
        "open_questions": ["Field-level migrations 18 months in if schema lives in JSONB."],
        "narrative": "All three models converged on Postgres for the event store..."
      }
    }
  ],
  "summary": null,
  "confidence_disclaimer": "Confidence scores (0–1) on claims and snippet comments are self-reported..."
}

Read the artifact stack section below for what each piece is for.


Authentication

Authorization: Bearer mmo_live_…

Keys are minted at /settings/api-keys. Each key is hashed at rest; copy the secret on creation — it's never shown again.

Calls without a valid key return 401 Unauthorized. Calls to a model your tier can't access return 403 with the available alternatives in the body.


Two modes

The same POST /api/deliberation endpoint serves both modes. Whether you supply a moderator_model decides which one runs.

Autonomous — fire and poll

"I want to fire-and-forget a complex deliberation and come back when it's done."

Provide a moderator_model and a rounds cap. mumo runs the full multi-round arc unattended. The endpoint returns immediately with status: "streaming". Poll GET /api/sessions/:id until status: "ready".

{
  "prompt": "...",
  "rounds": 4,
  "moderator_model": "claude-opus-4-6"
}

Remote — you drive the rounds

"I want to drive each round myself, steering with snippets between them."

Omit moderator_model. The endpoint runs round 1 synchronously and returns with status: "ready" and the round's full artifact stack. Read the claim_map, then call POST /api/sessions/:id/rounds with steering snippets to add the next round. Repeat as long as you want — there's no preset cap on remote sessions.

{ "prompt": "..." }

Single-call use case: if you only want one round of three opinions and no follow-up, just don't call append_round. The first response already contains the full artifact stack — there's no separate "single-shot" code path. The session's mode field will be "single_shot" (a label distinguishing intent — set by the server when rounds: 1 is the request) but the engine and the response shape are identical to remote.

The rounds field is only meaningful in autonomous mode, where it caps how many rounds the AI moderator runs (max 14). In remote mode, the field is recorded for diagnostic purposes but doesn't bound the session — append_round works as long as the session exists.


Distill artifacts (opt-in)

Every round always produces the raw per-model responses and a cross-model claim map. The finalizer can optionally also produce two distill artifacts:

  • brief — structured JSON: narrative, agreements, disagreements, and continuation.recommendation (stop / continue / explore). Written to rounds[].distill on the session response.
  • summary — a streaming narrative prose recap for the round. Rendered into the web-UI timeline between rounds; available on rounds[].summary.

Both default OFF on API and MCP. Programmatic consumers get the same value from responses + claim map without the extra LLM-call tax. Opt in via the distill param on POST /api/deliberation:

{ "prompt": "...", "distill": "both" }

Accepted values:

  • "off" — both disabled (default)
  • "brief" — only the structured JSON brief
  • "summary" — only the streaming narrative
  • "both" — both enabled
  • { "brief": boolean, "summary": boolean } — fine-grained control

If your agent is surfacing a narrative back to a human, pass "summary" or "both" on create. The defaults are tuned for programmatic consumption where responses + claim map are the highest-signal artifacts; the human-readable fields (summary prose, brief narrative) are generated lazily.

The continuation.recommendation field lives inside the brief. Request "brief" or "both" if you specifically want that stop/continue/explore signal.

The setting is session-scoped — pinned on session create and inherited by every append_round call. Admins can tune channel defaults via system_config (distill_default_api_brief, distill_default_api_summary).


The session response

Every GET /api/sessions/:id response (and the response from POST /api/deliberation for single-shot mode) carries these top-level fields:

FieldNotes
idSession UUID.
statusstreaming | processing | ready | failed. See "Status flow" below.
modeautonomous | remote | single_shot. The first two reflect the engine; single_shot is a label the server applies when rounds: 1 to signal "no follow-up intended" — it shares the remote engine.
active_modelsModel IDs participating in this session.
moderator_modelSet for autonomous sessions; null otherwise.
moderator_name, applicationOptional identity metadata.
model_metadata{ [model_id]: { display_name, provider } }.
created_at, estimated_ready_atTimestamps.
total_usageAggregated tokens_in / tokens_out across the session.
roundsArray of round objects (see below).
summarySession-level editorial. Null until generated for multi-round sessions.
confidence_disclaimerVerbatim advisory string. Surface alongside any displayed confidence scores.

Each round in rounds[] carries: index, prompt, completion_state (complete | partial_failure), responses, failed_models, claim_map, distill.

The four per-round artifacts:

  • responses[].text — raw prose from each model.

  • responses[].snippets[] — model-emitted reactions (typed KEEP/CHALLENGE/etc, with verbatim quotes from peers and optional commentary).

  • claim_map.claims[] — verbatim claims that ≥2 models reacted to, with each reactor's position (type + commentary). The highest-signal artifact for understanding agreement and disagreement.

  • distill — round-level structured synthesis:

    • key_finding — one sentence on what shifted in this round. string | null.
    • agreements[] — short statements of where the panel converged. string[] | null.
    • disagreements[] — each entry names the tension and the sides. string[] | null.
    • impactful_quote{ text, model, why } | null. The single quote that mattered most.
    • open_questions[] — forward-looking threads the panel left unresolved. string[] | null.
    • narrative — magazine-style prose (2–3 paragraphs). Always present when distill is non-null.
    • continuation — distill's judgment of whether another round is worth running. { convergence, recommendation, reasoning } | null. Details below.

    Nullability rule: the structured fields are null when distill ran on the legacy/unstructured path or when an upstream parse failed. They are never empty strings or empty arrays — those values are reserved for "absent." When you see agreements: [], the model produced no agreements; when you see agreements: null, structured distill wasn't computed for this round.

Continuation: deciding whether to run another round

distill.continuation is the single most useful field for agents driving remote-mode deliberations. It collapses "should I call append_round or stop here?" into one signal you can read straight off the response.

"continuation": {
  "convergence": 0.78,                  // 0.0–1.0; how consolidated the panel is on a stable answer
  "recommendation": "stop",             // "stop" | "continue" | "explore"
  "reasoning": "Three-of-three agreement on the structural diagnosis; remaining disagreement is on implementation detail unlikely to shift with further rounds."
}
  • convergence is trajectory-aware — it factors in the rounds before this one. A round that opens new productive disagreement may legitimately drop convergence vs the prior round. That is not a regression — it's a signal that the deliberation has uncovered a new dimension worth examining.
  • recommendation values:
    • "stop" — panel has converged enough that another round would be churn.
    • "continue" — unresolved tensions worth resolving in another round.
    • "explore" — productive new territory has emerged that another round would profitably deepen, even if convergence is lower. Use this for opportunity, not uncertainty.
  • reasoning cites this round's specific evidence (which agreements held, which positions shifted) so the recommendation is auditable.

How to use it: switch on recommendation for the simple case, threshold on convergence for custom logic, surface reasoning to humans deciding whether to trust the signal. The recommendation is the distill model's opinion, not a guarantee — same epistemic status as key_finding.

Most agents only need claim_map + distill to decide whether to continue or stop. The claim_map.claims[].positions[] array is what tells you where the panel agrees, where they're stuck, and which model said what.

The session-level summary field carries the final editorial across the whole session (surface, agreed, split, open blocks plus anchor_quote and og_quote). It's only populated for completed multi-round sessions.


Endpoints

POST /api/deliberation

Create a session. Body:

FieldTypeNotes
promptstringThe question or topic. Required.
referencestringOptional spec, doc, or design injected as shared context.
modelsstring[]2–3 model IDs. Defaults to platform selection. Call GET /api/models to enumerate.
roundsint1–14. Default 3. Only meaningful for autonomous mode (caps the AI moderator's arc). In remote mode, set it to whatever you like — append_round is unbounded by it. rounds: 1 causes the server to label the session single_shot (a hint that you don't intend to append).
moderator_modelstringModel ID to moderate autonomously. Omit for remote mode.
moderator_namestringDisplay name for the steering identity (≤100 chars). Surfaces in the published transcript.
applicationstringDisplay name of your client (≤100 chars). Surfaces in the session info panel.

Returns a session object (see Quickstart for an example shape). For autonomous sessions, status starts as streaming and transitions through processing to ready.

Idempotency: pass Idempotency-Key: <stable-string> to make retries safe. Same key + same body returns the cached response; same key + different body returns 409 idempotency_conflict.


POST /api/sessions/:id/rounds

Append a round to a remote-mode session. Body:

{
  "prompt": "Focus on the pricing mechanism, not positioning.",
  "snippets": [
    {
      "type": "CHALLENGE",
      "quote": "Per-seat pricing assumes teams of >10.",
      "quoted_model": "gpt-5.4",
      "comment": "Most enterprise pilots start at 3–5."
    },
    {
      "type": "KEEP",
      "quote": "Usage-based pricing aligns incentives.",
      "quoted_model": "claude-opus-4-6"
    }
  ]
}

snippets is optional but high-signal — it's how you steer attention round-to-round.

Snippet types:

  • KEEP — this point is strong; preserve it
  • EXPLORE — dig deeper here
  • CHALLENGE — push back on this claim
  • CORE — load-bearing; build on it
  • SHIFT — this reframes the question

Quotes must be verbatim from a prior round's response. quoted_model is the model ID that originated the quote.

Idempotency-Key is strongly recommended on this endpoint. Round-append duplication corrupts deliberation history.

Errors:

  • 409 session_busy — a round is currently streaming or processing. Retry after a short delay with the same Idempotency-Key.
  • 403 credit_exhausted — your wallet balance can't cover the next round's per-model minimum. Body includes effective_balance_usd, free_usd, subscription_usd, refill_usd, per_model_minimum_usd, next_reset_at. Free-tier balance resets on the 1st of each month (UTC).

GET /api/sessions/:id

Fetch the full state of a session — all rounds, responses, snippets, claim maps, distills, and the editorial summary if present.

For autonomous sessions, poll this until status === "ready". For single-shot or post-append flows, the response is fresh on every call.


GET /api/sessions

List your sessions.

QueryValues
modeautonomous | remote | single_shot (label-only variant of remote)
statusready | streaming
limit1–200 (default 7)
offsetpagination

Returns a lightweight list (no response bodies). Use GET /api/sessions/:id for full content.


GET /api/models

List the models available to your account tier. Returns id, provider, display name, context window, max output tokens, and pricing (per-million tokens, separated into input / cached input / output / cache write where applicable).


GET /api/defaults

Discover platform defaults and your current budget.

{
  "default_models": ["…", "…", "…"],
  "default_rounds": 3,
  "default_moderator_model": "…",
  "remaining_rounds_today": 5,
  "round_limit": 10,
  "resets_at": "2026-04-15T04:00:00Z"
}

POST /api/sessions/:id/summary

Generate (or return cached) editorial summary for a session. Idempotent — returns the cached summary if one already exists.

GET /api/health

Returns { "status": "ok", "version": "v1" }. No auth required.


Status flow

streaming  →  processing  →  ready
                            ↘
                             failed
  • streaming — at least one model in the most recent round is actively responding.
  • processing — all model responses landed; post-processing (snippet extraction, claim map, distill) is in flight.
  • ready — fully complete; safe to call append_round or read final artifacts.
  • failed — terminal error.

For autonomous mode: poll GET /api/sessions/:id until status === "ready".


Errors

All non-2xx responses return JSON:

{
  "error": "session_busy",
  "message": "Session has a round in progress — poll and retry with same Idempotency-Key",
  "retryable": true
}
CodeHTTPRetryableWhat to do
unauthorized401noBad or missing bearer token.
credit_exhausted403noWallet balance below the request's per-model minimum. Body includes bucket breakdown + next_reset_at. Top up (paid) or wait for the 1st-of-month free-tier reset.
forbidden403noFeature not available on your tier (e.g., moderator_model / autonomous mode requires paid).
not_found404noSession ID doesn't exist or isn't yours.
idempotency_conflict409noSame key reused with a different body. Use a new key.
session_busy409yesAnother round is in flight. Retry with the same Idempotency-Key.
internal_error500yesTransient. Retry with the same Idempotency-Key.

Confidence scores

When models emit self-reported confidence (via {{C=0.8}}…{{/C}} tags in their prose, or on snippet commentary), those scores surface on responses:

  • responses[].claim_confidence: [{ claim_text, confidence_score }] — per-claim scores extracted from prose. Tags are stripped from text before return.
  • responses[].snippets[].comment_confidence: number | null
  • confidence_disclaimer: string — verbatim advisory at the top of every session response.

These are self-reported and only meaningful relative to the same model's other claims. They are not calibrated across models. If you display them, surface the disclaimer too.


Naming philosophy

Field names you see in API requests and responses are the canonical contract — they're the names mumo guarantees to consumers. Internal type names and DB columns may differ; the serializer maps between them. We follow a contract-first principle and one-way mapping (internal → API), with the full mapping documented in docs/CONVENTIONS.md.

One convention worth knowing up front: snippet types (type field) are always UPPERCASE at the API boundary — KEEP, EXPLORE, CHALLENGE, CORE, SHIFT. They're lowercase only in internal storage.

We don't rename API fields casually. Any boundary rename comes with a deprecation period that accepts the old name as alias.