Multi-model deliberation for agents

Your agent's best answer
is just one perspective.

Q: Does my agent need to manage session IDs?

create_deliberation returns a session_id and round_id; pass both to wait_for_round. If another round is warranted, append using the same session ID, and participating models get full context plus cache utilization when possible on follow-up rounds.

Q: Can I pick specific models?

Yes. Pass a models array with 2-3 model IDs. Call list_models first to see what's available on your tier. If you don't specify models, mumo uses a platform-selected default roster tuned for panel dynamics.

Q: What happens if a model fails mid-session?

Participants fail independently. The session continues with whichever models succeed, and the failed model's slot is marked accordingly in the response with the rest of the panel's work intact.

Q: Are confidence scores reliable?

Confidence scores are self-reported and not calibrated across models. Treat them as rough signals within a model's own output rather than cross-model comparable; responses include a confidence_disclaimer string when scores are present.

mumo sends your prompt to multiple models and they stress-test each other. You and your agent see what holds up — and what doesn't.

Connect your agent→Try the web app

See the claim map

Claim Map

See the models react to each other.

Any model can give you an answer; mumo gives you cross-examination. Claims are championed or challenged before your agent sees them.

ClaudeGPTGeminiGrokKimiGLMQwen

PromptShould I use Prisma or raw SQL for the analytics queries in this service?

Claude

“Use an ORM here. The query complexity doesn’t justify raw SQL, and you’ll get type safety, migrations, and protection against SQL injection for free.”

GPT: This table will hit 10M rows within a year. ORMs generate inefficient joins at that scale — you’ll end up bypassing it anyway. Write the raw query now and save the rewrite later.

Gemini: GPT’s scale argument changed my mind. At this data volume, the ORM becomes a liability you’ll have to work around. Raw SQL with a query builder for type safety is the better middle ground.

PromptHow should I handle auth state — cache in Redis or read from DB on every request?

GPT

“Cache the user object in Redis after login and read from cache on every request. Database reads on every auth check are unnecessary latency.”

Claude: This creates a stale-permission window. If you revoke a user’s access, they keep acting on cached credentials until TTL expires. For auth data, the latency cost of a DB read is worth the consistency guarantee.

Grok: The staleness risk is real but manageable — invalidate on permission change, not on TTL. The 50ms you save per request compounds across thousands of concurrent users.

PromptWe're at 3 engineers. Should we split into microservices or stay monolithic?

Gemini

“Split this into microservices now. The auth module, the billing module, and the notification system have completely different scaling profiles and deployment cadences.”

Claude: You have three engineers and no traffic. Microservices at this stage means you’ll spend more time on infrastructure than product. A modular monolith gives you the same separation without the operational overhead.

GPT: Claude is right. The scaling argument assumes traffic you don’t have yet. Ship the monolith, draw clean module boundaries, and extract services when you actually hit the bottleneck.

PromptDo I need a migration framework or can I just version my SQL files manually?

Grok

“Skip the migration framework. Just write SQL files numbered sequentially and run them in order. Less tooling, fewer abstractions, and you always know exactly what’s running against your database.”

Gemini: This works until two developers write migration 047 on different branches. A proper migration tool handles conflicts, rollbacks, and environment tracking. The abstraction pays for itself the moment you have a team.

Claude: The concern about conflicts is valid for larger teams, but for a solo developer or a pair, numbered SQL files are simpler to reason about and debug than any migration framework’s DSL.

PromptSSR, SSG, or client-side rendering for the product changelog page?

Claude

“Use server-side rendering for this page. The data changes rarely, the page needs good SEO, and client-side hydration adds unnecessary bundle size for what’s essentially a read-only view.”

GPT: Agreed. SSR with a 60-second revalidation gives you fresh data without the client JS cost. This is the obvious call for a content-heavy page.

Grok: I was going to suggest static generation, but the ‘changes rarely’ part means SSR with revalidation is actually better — you get freshness without a full rebuild on every edit.

The Loop

Call. Read the room. Decide.

Self-contained, not self-limiting. When one round isn't enough, just append another. Context carries automatically.

01

Start a deliberation

Call create_deliberation with a prompt. Pick 2–3 models or let mumo choose. Optionally pass a reference doc.

02

Wait for the round

The write call returns an ack immediately. Call wait_for_round with the returned session_id and round_id; mumo polls progress cheaply and returns the full session once the round is done.

03

Read the room

rounds[0].responses[] has each model's full prose. rounds[0].claim_map.claims[]groups the quotes that drew cross-model reactions, with each model's commentary.

04

Steer with typed snippets — or stop

If you want more, call append_roundwith typed snippets quoting the claims you want kept, explored, challenged, or carried forward. Models see the type — a CHALLENGE makes them defend or revise; a neutral forward doesn't. Stop when you have what you need.

// Agent calls create_deliberation
{
  "prompt": "Postgres or MongoDB for the event store?",
  "application": "Claude Code"
}

// ↓ ack returns immediately
{
  "session_id": "abc-123",
  "round_id": "round-456",
  "round_index": 0,
  "status": "processing"
}

// Agent calls wait_for_round
{
  "session_id": "abc-123",
  "round_id": "round-456"
}

// ↓ completed session returns (abridged)
{
  "id": "abc-123",
  "status": "ready",
  "rounds": [{
    "responses": [
      { "model": "claude", "prose": "..." },
      { "model": "gpt", "prose": "..." }
    ],
    "claim_map": {
      "claims": [ /* quotes with typed reactions */ ]
    }
  }]
}

// Agent calls append_round with snippets
{
  "session_id": "abc-123",
  "prompt": "Resolve the storage choice.",
  "snippets": [
    { "type": "CHALLENGE", "quoted_model": "gpt",
      "quote": "Mongo handles the schema flux better." }
  ]
}

Things you should know

What agents ask about mumo

A few points worth calling out before you get started

What's the latency on a single round?⌄

Typical single-round deliberations take 30–60s depending on models chosen and prompt complexity. The write call returns an ack immediately; call wait_for_round to wait for the completed responses and claim map.

Does my agent need to manage session IDs?⌄

Manage is probably overstating it. create_deliberation returns a session_id and round_id; pass both to wait_for_round. If you think another round is warranted, append using the same session ID, and the participating models will get full context plus the benefit of cache utilization when possible on follow-up rounds.

Can I pick specific models?⌄

Yes — pass a models array with 2–3 model IDs. Call list_modelsfirst to see what's available on your tier. If you don't specify models, mumo uses a platform-selected default roster tuned for panel dynamics.

What happens if a model fails mid-session?⌄

Participants fail independently — the session continues with whichever models succeed. You'll see the failed model's slot marked accordingly in the response, with the rest of the panel's work intact. For critical deliberations, 3-model panels give you more resilience than 2.

Are confidence scores reliable?⌄

They're self-reported and not calibrated across models — treat them as rough signals within a model's own output rather than cross-model comparable. A confidence_disclaimer string is included in responses when scores are present; surface it if you display scores to users.

Install

Connect any MCP client.

Give your agent multi-model insights. Stop shipping your best guess.

Get Started→