Pipecat is an open-source Python toolkit for stitching together streaming speech, language models, and playback behind a single real-time loop—so product teams can focus on dialog design instead of rewiring transports for every vendor swap.
%%{init: {"theme": "base", "themeVariables": {"background": "transparent", "lineColor": "#000000"}}}%%
flowchart LR
A["Caller audio"] --> B["Transport WebRTC or WebSocket"]
B --> C["Optional VAD"]
C --> D["Speech to text"]
D --> E["LLM or speech to speech"]
E --> F["Text to speech or native audio"]
F --> G["Return stream"]
classDef agent fill:#8B0000,color:#fff
classDef hook fill:#189AB4,color:#fff
classDef decision fill:#444,color:#fff
class A,G agent
class B,C,D,E,F hook
Why teams reach for Pipecat
Property
What it buys you
Voice-first layout
Processors for streaming STT, LLM calls, and TTS sit on one timeline with explicit frame flow control.
Pluggable services
Swap Deepgram for AssemblyAI, Cartesia for ElevenLabs, or route LLM traffic across OpenAI, Anthropic, Gemini, Groq, Ollama, and others without rewriting orchestration glue.
Composable pipelines
Small units snap together for branching dialogs, tool calls, and guardrails—similar mental model to media pipelines.
Real-time transports
First-class adapters for WebSockets, WebRTC providers such as Daily or LiveKit, telephony serializers, and local debug paths.
Coverage snapshot
Plane
Examples supported upstream
Speech-to-text
Deepgram, AssemblyAI, Whisper-class routes via OpenAI or Groq, cloud STT from major hyperscalars, plus specialist vendors in the docs matrix.
LLMs
OpenAI, Anthropic Claude, Gemini, Grok, Mistral, Cerebras, SambaNova, Together, OpenRouter, Ollama for local inference, and more.
Text-to-speech
OpenAI, ElevenLabs, Cartesia, Deepgram, Azure, AWS, Google, Fish, Rime, Piper, and additional engines listed in the service catalogue.
Speech-to-speech
OpenAI Realtime, Gemini Multimodal Live, AWS Nova Sonic, Ultravox, Grok Voice Agent—useful when you want audio-native reasoning without discrete TTS.
Observability
OpenTelemetry hooks and Sentry integration ship for production tracing.
Quick start (from upstream docs)
Python 3.11 is the minimum supported release; maintainers recommend 3.12+. Core install stays lean—pull optional extras only for the providers you wire in.
# Scaffold a project (requires uv per README)
pipecat init quickstart
# Or add to an existing uv project
uv add pipecat-ai
# Provider-specific wheels (example pattern)
uv add "pipecat-ai[deepgram,cartesia,openai]"
Adjacent ecosystem
Pipecat Flows for structured state machines when a single linear pipeline is not enough.
Subagents repository for multi-agent buses when conversations need specialist hand-offs.
Voice UI Kit for client-side React and native SDKs that pair with the same transports.
CLI and Pipecat Cloud helpers for packaging agents once they leave a laptop.
At a glance
Signal
Takeaway
Community traction
The core repository is past 12k GitHub stars with active CI badges and Discord support—expect rapid surface-area growth.
Operational stance
Treat Pipecat as orchestration: you still own latency budgets, key management, and content moderation for customer-facing voice.
Learning path
Start with the quickstart, then clone the examples tree for incremental complexity before jumping to full demo apps.
When two different coding agents share the same goal primitive, a messaging-first orchestrator can route long-running implementation to one stack and structured critique to another—without you babysitting shells from a desk.
%%{init: {"theme": "base", "themeVariables": {"background": "transparent", "lineColor": "#000000"}}}%%
flowchart TD
U["User message from chat"] --> G["Gateway such as Telegram"]
G --> H["Hermes agent session"]
H --> P["Pick next worker for goal primitive"]
P --> B["Codex build pass"]
B --> S["Work completes"]
S --> R["Claude Code review pass"]
R --> K["Kanban or board state refresh"]
classDef agent fill:#8B0000,color:#fff
classDef hook fill:#189AB4,color:#fff
classDef decision fill:#444,color:#fff
class U,G,K agent
class H,P,B,S,R hook
What each layer contributes
Layer
Role in the loop
Where to read more
Messaging gateway
Accepts short natural-language goals while you are away from the IDE; pairs with Hermes security guidance on DM allowlists.
Hermes messaging docs cover Telegram, Discord, Slack, and related controls.
Hermes Agent
MIT-licensed gateway and tool loop from Nous Research: cron, skills, MCP, and multi-terminal backends so work can live on a small VPS instead of a laptop.
Upstream README and docs site.
Codex /goal
OpenAI Codex can hold a durable objective across many tool turns—useful for autonomous implementation when checkpoints are clear.
OpenAI “Follow a goal” use case guide.
Claude Code /goal
Anthropic Claude Code runs successive turns until a stated completion test passes, with a fast gate between turns.
Claude Code goal documentation.
Why split builder and reviewer stacks
Idea
Engineering angle
Decorrelated blind spots
One model family may repeat the same mistaken assumption on both generation and critique; handing review to another toolchain pressures the plan with different priors.
Operational clarity
Kanban-style cards map cleanly to “building” versus “awaiting review”, which keeps async agents from colliding silently.
Closer analogue inside Claude Code
OpenAI’s codex-plugin-cc already wraps local Codex reviews inside Claude Code for users who want that separation without leaving the editor.
Controls operators still need
Budget caps: background goals on two providers can burn tokens quickly—set spend alerts and pause hooks before enabling always-on queues.
Deterministic gates: community feedback on the original thread highlights lint, type-check, and CI scripts as non-negotiable back-stops once models disagree.
Secrets hygiene: remote chat triggers should stay behind paired accounts, command allowlists, and isolated working directories as described in Hermes security guidance.
At a glance
Signal
Takeaway
Primitive parity
Matching /goal-style autonomy on Codex and Claude Code lets an orchestrator swap workers without rewriting the playbook.
Human value
You still define stopping conditions, tests, and review criteria—agents accelerate execution, they do not replace product judgement.
Deployment shape
Keep the gateway on infrastructure you control (for example a small VPS) so mobile triggers never expose a laptop sleep/wake race.
Floci is an MIT-licensed, AWS-shaped local emulator you run beside your laptop stack: point the normal SDK or CLI at localhost:4566 and exercise a broad slice of control-plane behaviour without touching a live cloud account.
%%{init: {"theme": "base", "themeVariables": {"background": "transparent", "lineColor": "#000000"}}}%%
flowchart LR
A["App or test suite"] --> B["AWS SDK or CLI"]
B --> C["Endpoint override localhost:4566"]
C --> D["Floci router"]
D --> E["In-process services"]
D --> F["Docker-backed runtimes"]
F --> G["Docker Engine on host"]
E --> H["In-memory or disk-backed state"]
classDef agent fill:#8B0000,color:#fff
classDef hook fill:#189AB4,color:#fff
classDef decision fill:#444,color:#fff
class A,B agent
class C,D,E,F,G hook
class H decision
What changed in the landscape
Signal
Practical meaning
LocalStack Community policy
Upstream blog notes auth tokens from March 2026 and frozen security updates for the community path—teams are re-evaluating local emulators.
Floci positioning
Marketed as a no-token drop-in on port 4566, with environment translation from common LocalStack variables.
Social summary vs README
Viral posts emphasise a tiny footprint; the project README quotes about 13 MiB idle RAM and a ~90 MB image—still far lighter than pulling a multi-GB stack for every developer machine.
Architecture at a glance
Layer
Role
Notes from upstream
HTTP front door
JAX-RS on Vert.x
Single port 4566 mirrors the LocalStack-style workflow.
In-process plane
S3, DynamoDB, IAM, STS, KMS, Cognito, Step Functions, EventBridge, API Gateway, and dozens more
Stateful modes include memory, hybrid, WAL, and persistent storage profiles.
Treat Floci as a fast feedback loop for integration tests and demos—not a contractual guarantee of full AWS parity.
Community threads already flag API conformance gaps on complex services such as DynamoDB; pair emulator runs with selective tests against real sandboxes when behaviour must be exact.
Anything that shells out to Docker inherits host resource limits, socket permissions, and image pull policies—budget CI time accordingly.
Takeaways
Topic
Decision hint
Developer experience
Reuse existing boto3, AWS CLI v2, or SDK v3 clients with endpoint_url or AWS_ENDPOINT_URL—no rewrites.
Migration
Swap container image from localstack/localstack to floci/floci:latest; upstream documents env var translation.
Risk
Validate critical paths against production-shaped data because emulators always truncate edge cases.
A practical split for interactive science demos—especially 3D biology explorers—is to let a GPT Image-class model shape the interface while a strong coding model such as Gemini 3.1 Pro owns orchestration, scene logic, and tool calls.
%%{init: {"theme": "base", "themeVariables": {"background": "transparent", "lineColor": "#000000"}}}%%
flowchart LR
B[Science brief] --> U["GPT Image style pass"]
U --> M[UI frames and assets]
B --> C["Gemini 3.1 Pro coding"]
C --> A[App logic and 3D wiring]
M --> P[Interactive web app]
A --> P
classDef agent fill:#8B0000,color:#fff
classDef hook fill:#189AB4,color:#fff
class B,P agent
class U,M,C,A hook
Remotion ships an official Agent Skills bundle so coding agents such as Claude Code learn Remotion timing, media, and composition rules—installable with one CLI line alongside new `bun create video` scaffolding.
%%{init: {"theme": "base", "themeVariables": {"background": "transparent", "lineColor": "#000000"}}}%%
flowchart LR
P[Natural language brief] --> I["npx skills add remotion-dev/skills"]
I --> A[Agent with Remotion skills]
A --> R[React Remotion project]
R --> V[Programmatic video output]
classDef agent fill:#8B0000,color:#fff
classDef hook fill:#189AB4,color:#fff
class P,V agent
class I,A,R hook
Install path
Command
Purpose
npx skills add remotion-dev/skills
Pull the maintained Remotion Agent Skills package into your workspace
bun create video
New Remotion project flow that can add the same skills during bootstrap
What the skills are for
They encode best practices for Remotion’s React-based, programmatic video model so agents avoid common timing, asset, and composition mistakes. Remotion links the catalogue to the open Agent Skills standard and hosts the source beside the main repository.
At a glance
Topic
Takeaway
Goal
Let Claude Code, Codex, or Cursor ship Remotion code that matches framework conventions
Install
npx skills add remotion-dev/skills per official docs
Source tree
Skills live under packages/skills in the Remotion monorepo
Gemini 3.1 Flash-Lite is now positioned as Google’s fastest, lowest-cost Gemini 3 tier for high-volume work—agentic pipelines, translation, and light extraction—while keeping a stable API model id for production calls.
%%{init: {"theme": "base", "themeVariables": {"background": "transparent", "lineColor": "#000000"}}}%%
flowchart LR
V[High volume traffic] --> M["Gemini 3.1 Flash-Lite"]
M --> T[Text and structured outputs]
M --> A[Tool calling and routing]
classDef agent fill:#8B0000,color:#fff
classDef hook fill:#189AB4,color:#fff
class V agent
class M,T,A hook
What shipped
Surface
Status
Gemini API / AI Studio
Stable model code gemini-3.1-flash-lite with multimodal inputs and text output
Google Cloud
Generally available on Gemini Enterprise Agent Platform as of 7 May 2026
Why teams pick Flash-Lite
Theme
Detail
Cost and latency
Built for ultra-low latency and large batch spend; Google cites strong price/performance versus prior Flash tiers on external speed benchmarks
Agentic fit
Function calling, structured outputs, caching, batch, and “thinking” controls for depth when needed
Typical workloads
Translation, moderation-scale text, transcription-style audio-to-text, PDF triage, lightweight routers classifying traffic to heavier models
API snapshot
Item
Value
Model string
gemini-3.1-flash-lite
Context window
Up to 1,048,576 input tokens and 65,536 output tokens (per Gemini API model card)
Public pricing (API blog)
About $0.25 per million input tokens and $1.50 per million output tokens at the announced preview price point
At a glance
Topic
Takeaway
Positioning
Most cost-efficient Gemini 3 model for scale-heavy, latency-sensitive jobs
Cursor announced a /orchestrate skill aimed at the Cursor SDK that recursively spawns agents for large jobs—backed with internal figures on token savings and cold-start latency.
%%{init: {"theme": "base", "themeVariables": {"background": "transparent", "lineColor": "#000000"}}}%%
flowchart TD
U[User goal] --> O["/orchestrate skill"]
O --> A1[Sub-agent A]
O --> A2[Sub-agent B]
O --> A3[Sub-agent N]
A1 --> M[Merge results]
A2 --> M
A3 --> M
M --> R[SDK outcome]
classDef agent fill:#8B0000,color:#fff
classDef hook fill:#189AB4,color:#fff
class U,R agent
class O,A1,A2,A3,M hook
What Cursor said
Item
Detail
Skill
/orchestrate — recursive agent spawning for ambitious tasks via the Cursor SDK
Internal example
Autoresearch of internal skills: about 20% lower token use with better evals
Internal example
Internal backend cold start: about 80% reduction (as reported by Cursor)
How it fits the wider Cursor 3.3 drop
The same-day Cursor 3.3 notes emphasise parallel work with async subagents—Build in Parallel on plans, editor /multitask, and pinned skills as quick actions—so teams can mix UI-driven parallelism with skill-driven orchestration patterns.
Skills mechanics (context)
Cursor loads project and user skill folders (for example .cursor/skills/), exposes them to Agent, and supports explicit /skill-name invocation; skills can bundle scripts and references for progressive loading—useful when an orchestration skill coordinates long-running SDK sessions.
At a glance
Topic
Takeaway
Intent
Recursive multi-agent runs for large SDK tasks
Evidence
Cursor on X (7 May 2026) plus same-day changelog and skills documentation
Try next
Invoke /orchestrate from Agent chat once the skill is available in your workspace
OpenAI shipped GPT-Realtime-2 in the Realtime API so voice agents can keep a live conversation moving while applying GPT-5-class reasoning, parallel tools, and larger session context—alongside new streaming translation and transcription models.
%%{init: {"theme": "base", "themeVariables": {"background": "transparent", "lineColor": "#000000"}}}%%
flowchart LR
U[User audio] --> R[GPT-Realtime-2 session]
R --> T[Tools and retrieval]
T --> R
R --> O[Spoken reply]
classDef agent fill:#8B0000,color:#fff
classDef hook fill:#189AB4,color:#fff
class U,O agent
class R,T hook
Three models in one drop
Model
Role
GPT-Realtime-2
Speech-to-speech with configurable reasoning, stronger tool use, longer sessions
GPT-Realtime-Translate
Live speech translation (70+ input languages to 13 output languages)
GPT-Realtime-Whisper
Streaming speech-to-text as the user talks
What GPT-Realtime-2 adds for builders
Capability
Detail
Context
128K context window (up from 32K) for longer agent flows
Reasoning
Adjustable effort from minimal through xhigh (low default)
Tools
Parallel tool calls with short spoken preambles so latency feels covered
Recovery
More explicit spoken fallbacks instead of silent failure
Delivery
Better tone control for calm, empathetic, or upbeat responses
Pricing snapshot (API)
Model
Billing basis
GPT-Realtime-2
About $32 per million audio input tokens ($0.40 cached) and $64 per million audio output tokens
GPT-Realtime-Translate
About $0.034 per minute
GPT-Realtime-Whisper
About $0.017 per minute
All three run through the Realtime API; the announcement pairs them with customer examples (travel, telecom, property search) where low-latency speech, translation, or live captions must stay aligned with changing user intent.
At a glance
Topic
Takeaway
Positioning
First Realtime voice model with GPT-5-class reasoning in the API
Companion surfaces
Live translation plus streaming transcription in the same release wave
Where to try
Realtime Playground and Realtime docs for session setup
Anthropic’s SpaceX compute agreement adds Colossus 1 capacity so Claude Code and Claude API limits can move up the same day—here is what changed for subscribers and API callers.
%%{init: {"theme": "base", "themeVariables": {"background": "transparent", "lineColor": "#000000"}}}%%
flowchart LR
S[SpaceX Colossus 1 capacity] --> A[Anthropic training and inference]
A --> C[Claude Code higher ceilings]
A --> P[Claude API Opus limits]
classDef agent fill:#8B0000,color:#fff
classDef hook fill:#189AB4,color:#fff
class S,A agent
class C,P hook
What went live on 6 May 2026
Area
Change
Claude Code (Pro, Max, Team, seat-based Enterprise)
Five-hour rate limits doubled
Claude Code (Pro and Max)
Removed peak-hours reduction on those plans
Claude API
Higher rate limits for Claude Opus models (see vendor rate-limit docs for numbers)
What the SpaceX slice adds
Item
Stated scope
Facility
All compute capacity at SpaceX Colossus 1
Power
More than 300 megawatts of new capacity
Accelerators
Over 220,000 NVIDIA GPUs (within the month)
Downstream focus
Capacity called out for Claude Pro and Claude Max subscribers
The same announcement frames wider infrastructure work (other hyperscaler and infrastructure partners) and notes interest in future orbital compute at gigawatt scale; day-one user impact above is the limit moves for Claude Code and Opus API traffic.
At a glance
Topic
Takeaway
Trigger
New SpaceX Colossus 1 supply plus other recent compute deals
Product impact
Higher Claude Code ceilings and raised Opus API limits effective immediately
Claude Managed Agents now pair always-on work sessions with an explicit success rubric, asynchronous memory curation, and HTTPS webhooks so long-running agent jobs can finish, self-correct, and notify your stack without constant polling.
%%{init: {"theme": "base", "themeVariables": {"background": "transparent", "lineColor": "#000000"}}}%%
graph TD
A[Managed agent session] --> B[Outcome rubric and iterations]
B --> C[Separate grader context]
C -->|needs revision| B
C -->|satisfied| D[Idle with deliverables]
A --> E[(Memory store)]
E --> F[Dream job reviews transcripts]
F --> E
A --> G[HTTPS webhooks on milestones]
classDef agent fill:#8B0000,color:#fff
classDef hook fill:#189AB4,color:#fff
classDef decision fill:#444,color:#fff
class A agent
class E,F,G hook
class C decision
What changed on 6 May 2026
The live Code with Claude stream introduced dreaming as a research preview inside Managed Agents, while outcomes, multi-agent orchestration, and webhooks moved to public beta alongside the existing memory features.
Surface area
Availability
What it gives you
Dreaming
Research preview (access request)
Scheduled consolidation of memory stores plus optional mining of up to 100 past sessions
Outcomes
Public beta
Rubric-backed iterations with an isolated grader and webhook completion signals
Multi-agent orchestration
Public beta
Coordinator agents that delegate to specialists with isolated threads on a shared filesystem
Webhooks
Public beta
Small signed HTTPS callbacks instead of polling for session, thread, outcome, and vault events
Dreaming: curate memory without mutating the source store
Dreaming runs as an asynchronous job that reads a memory store and up to one hundred session transcripts, then writes a brand-new store containing merged facts, removed contradictions, and freshly surfaced patterns. The input store stays read-only until you adopt the output, which keeps experiments reversible.
Item
Detail
Beta headers
managed-agents-2026-04-01 plus dreaming-2026-04-21 on dream calls
Models supported today
claude-opus-4-7 and claude-sonnet-4-6
Instruction budget
4,096 characters of extra guidance per dream
Runtime
Typically minutes to tens of minutes depending on transcript volume
Billing
Standard token metering on the selected dream model
When you emit user.define_outcome, the harness spins up a grader that scores each criterion independently in its own context window, then returns either a pass or a precise gap list that feeds the next agent revision. You can supply the rubric inline or via the Files API, cap the loop with max_iterations (default three, hard maximum twenty), and subscribe to session.outcome_evaluation_ended webhooks when grading rounds finish.
Grader result
What happens next
satisfied
Session returns to idle with deliverables under /mnt/session/outputs/
needs_revision
Agent takes another pass using the supplied critique
max_iterations_reached
Loop halts after the configured ceiling
failed
Rubric and task description were incompatible
interrupted
Operator paused the outcome via user.interrupt
Multi-agent orchestration: coordinators, threads, and limits
Coordinators declare a roster of delegate agents (maximum twenty unique IDs, single hop only) and spawn isolated session threads that keep their own transcripts while sharing the container filesystem. Up to twenty-five threads may run concurrently; the primary session stream stays a condensed feed while per-thread streams expose full tool traces when you need forensic detail.
Webhooks: verify signatures, then hydrate objects yourself
Endpoints must be public HTTPS on port 443. Each delivery includes a signing secret (whsec_) and an X-Webhook-Signature header; Anthropic’s SDK unwrap() helper validates the payload and rejects anything older than five minutes, which also gives you a safe retry discriminator because duplicate retries reuse the same event.id.