Categories
News

Agents CLI: scaffolding, evals, and deploy for ADK on Google Cloud

Google’s Agents CLI packages the lifecycle around the open-source Agent Development Kit (ADK): scaffold an ADK Python project, wire tools and orchestration, run evaluation harnesses, and push builds to managed Google Cloud targets—whilst optional “skills” teach coding agents the same workflows end-to-end.

Where Agents CLI sits in the stack

%%{init: {"theme": "base", "themeVariables": {"background": "transparent", "lineColor": "#000000"}}}%%
flowchart TB
    subgraph dev [Developer machine]
        A[Coding agent with skills] --> B[agents-cli]
        B --> C[ADK Python project]
    end
    C --> D[Local run / Dev UI]
    C --> E[Eval sets + judges]
    E --> F{Ship}
    F --> G[Agent Runtime / Cloud Run / GKE]

    classDef agent fill:#8B0000,color:#fff
    classDef hook fill:#189AB4,color:#fff
    classDef decision fill:#444,color:#fff

    class A agent
    class B hook
    class C agent
    class D hook
    class E decision
    class F decision
    class G agent

Install and prerequisites

RequirementNotes
Python3.11+
uvRecommended runner for uvx installs
Node.jsRequired for skills installation path
Optional for deployGoogle Cloud SDK, Terraform
PlatformsmacOS, Linux, Windows via WSL 2; native Windows not officially supported
# One-shot setup (installs CLI + skills bundle for coding agents)
uvx google-agents-cli setup

# Alternatives from upstream docs
# pipx install google-agents-cli && agents-cli setup
# pip install google-agents-cli && agents-cli setup
# Skills only: npx skills add google/agents-cli

Existing gcloud application-default credentials are picked up automatically when present—useful for iterative deploy loops without pasting keys into the agent transcript.

Bundled skills (what the coding agent learns)

Skill idScope
google-agents-cli-workflowEnd-to-end lifecycle, model choice guardrails, code preservation rules
google-agents-cli-adk-codeADK Python patterns—agents, tools, orchestration, callbacks, state
google-agents-cli-scaffoldProject create / enhance / upgrade templates
google-agents-cli-evalMetrics, eval sets, trajectory scoring, LLM-as-judge configuration
google-agents-cli-deployAgent Runtime, Cloud Run, GKE, CI/CD, secrets handling
google-agents-cli-publishGemini Enterprise registration flow
google-agents-cli-observabilityCloud Trace, structured logging, third-party telemetry hooks

Core commands

CommandPurpose
agents-cli setupInstall CLI plus skills into supported coding agents
agents-cli scaffold …Generate or mutate an ADK project tree
agents-cli eval runExecute configured evaluation passes
agents-cli deployShip to selected Google Cloud runtime
agents-cli publish gemini-enterpriseSurface the agent inside Gemini Enterprise
agents-cli login / --statusAuth against Google Cloud or AI Studio

Relationship to ADK

ADK remains the framework: code-first agents, multi-agent graphs, rich tool surfaces (functions, OpenAPI tools, Google Cloud connectors), tracing, and “deploy anywhere” containers. Agents CLI is deliberately not a replacement for Gemini CLI, Claude Code, or Codex—it is the factory line those tools drive when they need opinionated Google Cloud paths for scaffolding, evaluation, and promotion to production.

Cloud vs local

PhaseCloud account?
Create, run locally, author eval assetsNo—an AI Studio API key suffices for Gemini-backed ADK loops
Deploy, central observability, enterprise registryYes—project billing, IAM, and runtime choice (Agent Runtime, Cloud Run, GKE)

Operational checklist

RiskMitigation
Agent-generated code ownershipEnforce human review on agents-cli scaffold enhance diffs; pin dependency versions.
Eval gap before prodRequire agents-cli eval run in CI with frozen eval sets, not only ad-hoc LLM judging.
Secret sprawlUse workload identity + Secret Manager patterns bundled in deploy skill rather than literals in prompts.
Token burnLean on skills for repetitive ADK API lookup instead of re-explaining the framework each session.

For teams already standardising on ADK, Agents CLI is best read as glue automation plus teaching artefacts: it shortens the distance between a natural-language brief and a repo that is structured the way Google’s own agent engineers expect—provided you still treat production gates as human-owned.

Categories
News

MiMo-V2-Flash: Xiaomi’s open MoE bet on agents and long context

Xiaomi’s MiMo-V2-Flash is an open-weight mixture-of-experts language model pitched for reasoning, software engineering, and agentic tool use—released with permissive licensing and same-day open inference code so teams can self-host at throughput-oriented budgets.

Scale and routing

DimensionPublicly quoted profile
Parameter budgetOn the order of 300B+ total parameters with roughly 15B active per token—typical of large sparse MoE stacks built for quality per flop.
ExpertsOrder of 250+ routed experts with a small subset fired each step—keeps memory bandwidth closer to the active footprint than a dense twin would demand.
LicensingMIT open weights, lowering friction for commercial fine-tunes and downstream redistribution.

Architecture: hybrid attention for long jobs

The stack interleaves sliding-window attention with periodic global attention blocks—roughly a five-to-one cadence in public diagrams—so most layers attend locally for speed whilst global layers refresh cross-segment context. Reported training targets sit in the tens of thousands of tokens natively, with product messaging extending usable context toward six-figure token lengths for retrieval-heavy agents.

%%{init: {"theme": "base", "themeVariables": {"background": "transparent", "lineColor": "#000000"}}}%%
flowchart TB
    subgraph layer [Representative hybrid block]
        L1[Sliding-window layers] --> L2[Global attention layer]
    end
    T[Token stream] --> layer
    layer --> R[Router → expert MLPs]
    R --> O[Hidden state out]

    classDef agent fill:#8B0000,color:#fff
    classDef hook fill:#189AB4,color:#fff
    classDef decision fill:#444,color:#fff

    class T agent
    class L1 hook
    class L2 decision
    class R hook
    class O agent

Post-training: multi-teacher on-policy distillation

The team highlights Multi-Teacher On-Policy Distillation (MOPD)—a recipe meant to dampen the classic post-training seesaw where gains on mathematics or coding can erode safety or vice versa. The idea is to blend specialised teacher signals whilst staying on the student’s own sampling distribution, preserving behaviours that matter for tool-grounded agents.

Serving story

TopicWhat to expect
Inference stackReference kernels and configs shipped for SGLang on launch day—signals that Xiaomi expects the model to run in vLLM-class servers, not only proprietary clouds.
Throughput opticsCommunity benchmarks on modern accelerators quote very high prefill tokens per second when multi-layer multi-token prediction is enabled—treat numbers as hardware-specific until you replicate on your cluster.
MarketplacesThird-party API routers added the model quickly—useful for A/B tests before you commit GPU capital.

Who should adopt first

MiMo-V2-Flash is aimed at teams that want frontier-shaped capability with open weights and a story centred on agentic coding and long-context retrieval. If you only need lightweight instruction models, the operational cost of hosting a 300B-class MoE will be overkill—benchmark on a slice of your real traces before replatforming.

Categories
News

Qwen3.6-27B: dense hybrid attention and thinking preservation

Alibaba’s Qwen team has shipped Qwen3.6-27B—the first dense open-weight entry in the Qwen 3.6 line—combining a hybrid linear-attention backbone, optional preservation of prior chain-of-thought across turns, and a 262K native context window stretchable into the million-token regime with YaRN.

Weights, licence, and runtimes

ArtifactDetail
Hub namesQwen/Qwen3.6-27B (BF16) and Qwen/Qwen3.6-27B-FP8 fine-grained quantisation (block size 128)
LicenceApache 2.0
Documented stacksSGLang ≥0.5.10, vLLM ≥0.19.0, KTransformers, Hugging Face Transformers

Layer geometry

The transformer stacks 64 layers with a repeating 3×(Gated DeltaNet → FFN) + 1×(Gated Attention → FFN) rhythm. Three quarters of sublayers use Gated DeltaNet linear attention (48 value heads / 16 QK heads in the public card), whilst every fourth sublayer uses conventional gated multi-head attention with a reduced KV head count to shrink cache footprint. Feed-forward blocks expand to an intermediate width of 17,408 dimensions.

%%{init: {"theme": "base", "themeVariables": {"background": "transparent", "lineColor": "#000000"}}}%%
flowchart LR
    subgraph block [One macro-block]
        D1[DeltaNet] --> D2[DeltaNet]
        D2 --> D3[DeltaNet]
        D3 --> A[Gated attention]
        A --> F[FFN]
    end
    IN[Tokens + optional vision] --> block
    block --> MTP[Multi-token prediction head]

    classDef agent fill:#8B0000,color:#fff
    classDef hook fill:#189AB4,color:#fff
    classDef decision fill:#444,color:#fff

    class IN agent
    class D1 hook
    class D2 hook
    class D3 hook
    class A decision
    class F hook
    class MTP agent

Context and “thinking preservation”

Native context is 262,144 tokens; YaRN scaling is advertised up to 1,010,000 tokens for experimental long-document jobs. For multi-turn agents, the release introduces thinking preservation—an API/template flag that keeps earlier chain-of-thought blocks in the visible history so the model does not pay to re-derive the same scratch work each tool round.

Reported benchmark snapshots

BenchmarkScore cited in release materials
SWE-bench Verified77.2
SWE-bench Pro53.5 (above a 397B-parameter MoE from the prior generation in the same table)
Terminal-Bench 2.059.3
QwenWebBench1487
NL2Repo36.2
GPQA Diamond / AIME26 / LiveCodeBench v687.8 / 94.1 / 83.9

Deployment notes

Treat the FP8 checkpoint as the default path when VRAM is tight; validate perplexity and tool-call accuracy on your own eval harness because quantisation interacts badly with brittle JSON tool grammars. Pair the model with a sandbox that mirrors Terminal-Bench-style constraints if you plan to expose shell access—benchmark scores do not substitute for hardened ops reviews.

Categories
News

Open models: MiMo-V2-Flash scale meets Qwen3.6-27B agentic density

Two heavyweight open-model lines moved in the same news cycle: a Chinese phone OEM shipped a very large sparse “flash” language model aimed at reasoning, coding, and agentic workloads, whilst Alibaba’s Qwen team landed a dense 27B multimodal stack tuned for repository-scale agents—with both camps emphasising serving efficiency and day-zero inference tooling.

Two release archetypes

%%{init: {"theme": "base", "themeVariables": {"background": "transparent", "lineColor": "#000000"}}}%%
flowchart TB
    subgraph flash [Sparse flash LLM]
        A1[MoE backbone] --> A2[Hybrid attention]
        A2 --> A3[Post-train distill]
        A3 --> A4[Open weights + SGLang]
    end
    subgraph qwen [Dense hybrid LLM]
        B1[DeltaNet + full attention] --> B2[MTP speculative decode]
        B2 --> B3[Thinking preservation]
        B3 --> B4[HF + vLLM / SGLang]
    end

    classDef agent fill:#8B0000,color:#fff
    classDef hook fill:#189AB4,color:#fff
    classDef decision fill:#444,color:#fff

    class A1 agent
    class A2 hook
    class A3 decision
    class A4 hook
    class B1 hook
    class B2 decision
    class B3 agent
    class B4 hook

Sparse “flash” foundation model

DimensionReported shape
ScaleOrder of 300B+ total parameters with on the order of 15B active per token—MoE routing with hundreds of experts and a handful activated per step.
AttentionHybrid layout mixing sliding-window blocks with periodic full-attention blocks; long-context training in the tens of thousands of tokens with extension into six-figure context in product messaging.
Training recipeMulti-teacher on-policy distillation used to reduce the classic post-training trade-off between maths, coding, and safety.
Licensing & deliveryOpen weights under a permissive licence; inference stacks published alongside launch for high-throughput prefill and multi-layer multi-token prediction decode paths.
EcosystemSame-day integration in major open inference engines and third-party API marketplaces—signals that vendors expect agent builders to adopt quickly.

Qwen3.6-27B: dense hybrid for agents

DimensionDetail
Parameters & licence27B dense causal LM with vision encoder; Apache 2.0 open weights.
Weights on hubBF16 and fine-grained FP8 (block size 128) variants with near-parity quality.
Layer pattern64 layers: repeating 3×(Gated DeltaNet → FFN) + 1×(Gated Attention → FFN)—three quarters linear attention, one quarter standard attention for KV-memory savings on long jobs.
DecodingMulti-token prediction trained for speculative decoding at serve time.
Context262,144 tokens native; YaRN extension advertised up to 1,010,000 tokens—team guidance to keep ≥128K when relying on extended “thinking” behaviour.
Thinking preservationOptional template flag to retain prior chain-of-thought across turns—aimed at fewer redundant reasoning tokens and better KV reuse in tool loops.
RuntimeDocumented compatibility floors include SGLang ≥0.5.10, vLLM ≥0.19.0, plus KTransformers and Transformers.

Reported benchmark highlights (Qwen3.6-27B)

BenchmarkReported scoreNotes
SWE-bench Verified77.2Community autonomous coding bar; team positions it near top proprietary coding models.
SWE-bench Pro53.5Above a 397B-parameter MoE from the prior generation on the same table.
Terminal-Bench 2.059.3Heavy sandbox runtime; score quoted on par with a flagship closed coding model in the same write-up.
QwenWebBench1487Internal bilingual web/front-end generation suite—large jump vs earlier 27B baselines.
NL2Repo36.2Repository-level generation metric.
Reasoning samplesGPQA Diamond 87.8; AIME26 94.1; LiveCodeBench v6 83.9Illustrative reasoning and code competition proxies.

Why this pairing matters

One line doubles down on extreme MoE scale + throughput optics for frontier-style workloads; the other shows a mid-size dense hybrid can still punch above much larger MoE predecessors on agentic coding tables while staying deployable on commodity GPU farms. Together they reinforce that 2026 competition is as much about inference economics and tooling as about raw parameter counts.

Categories
AI

Gemini Embedding 2: multimodal vectors for unified retrieval

Gemini Embedding 2 maps text, images, video, audio, and PDFs into one shared vector space so retrieval, clustering, and recommendations can run cross-modally without maintaining separate encoders per modality.

Flow from content to similarity search

%%{init: {"theme": "base", "themeVariables": {"background": "transparent", "lineColor": "#000000"}}}%%
flowchart LR
    A[Multimodal input] --> B[Embedding model]
    B --> C[Float vector]
    C --> D[Index / ANN]
    Q[Query embedding] --> D
    D --> R[Ranked matches]

    classDef agent fill:#8B0000,color:#fff
    classDef hook fill:#189AB4,color:#fff
    classDef decision fill:#444,color:#fff

    class A agent
    class B hook
    class C decision
    class D hook
    class Q agent
    class R agent

Model identifiers

SurfaceTypical model stringNotes
Gemini APIgemini-embedding-2-previewPreview track; check current naming in your SDK.
Vertex AIgemini-embedding-2Managed endpoint ID may differ by region—verify in console docs.

Inputs and practical limits

ModalityWhat to expect
TextLonger context than the prior text-only embedding family—on the order of 8k tokens for a single embed request.
ImagesMultiple still images per request (common cap around half a dozen); raster formats such as PNG and JPEG.
VideoShort clips (on the order of two minutes) in widely used container formats.
AudioNative audio embedding without forcing an intermediate transcript.
DocumentsDirect PDF ingestion for small multi-page documents in one call.

Vector size and Matryoshka (MRL) truncation

Default output length is 3072 floats. The family is trained with Matryoshka Representation Learning: early prefix dimensions remain meaningful, so you can request a smaller output_dimensionality (or equivalent in your client) to cut storage and dot-product cost. Typical choices called out in documentation are 768, 1536, and 3072; supported range is roughly 128–3072.

Normalisation for cosine similarity

Full 3072-dimensional vectors are already L2-normalised. If you truncate to other sizes, apply the same normalisation yourself before comparing directions with cosine similarity.

Versus the earlier text-only embedding model

AspectPrior text embedding (stable family)Gemini Embedding 2
ModalitiesText in, dense vector out.Text, image, video, audio, PDF in unified space.
Typical text token budgetShorter (on the order of 2k tokens).Larger (on the order of 8k tokens).
MRL sizingSupported.Supported with the same dimension trade-off mindset.
Best fitText-only RAG and classification.Cross-modal search, mixed media catalogues, multimodal deduplication.

Integration sketch

# Pseudocode — align field names with your official client (REST or SDK).
request = {
  "model": "gemini-embedding-2-preview",
  "contents": multimodal_parts,  # text + optional image/video/audio/pdf parts
  "config": {
    "output_dimensionality": 768,   # optional; omit for full 3072
    "task_type": "RETRIEVAL_DOCUMENT"  # optional hint where supported
  }
}
vector = embed(request).values

Operational summary

CheckAction
Preview driftPin model version strings and re-embed corpora when Google promotes a stable ID.
Index schemaStore dimensionality and normalisation flag per collection; do not mix truncated and full vectors.
Latency costLarge video or multi-image batches increase wall-clock time—batch asynchronously for backfills.
EvaluationBenchmark recall@k on your own queries; public leaderboards do not replace domain-specific retrieval tests.

Use Gemini Embedding 2 when a single vector index must answer text-over-image, image-over-text, or document-plus-audio style queries; keep the prior text-only model when your pipeline is strictly linguistic and you want the smallest integration surface.

Categories
AI

LiteLLM AI Gateway: beta memory API for agents

Beta gateway memory keeps durable, caller-scoped key–value context behind the same AI gateway you already route through—so agent runtimes can stay thin while persistence, visibility, and admin boundaries live in one place.

Where memory sits

%%{init: {"theme": "base", "themeVariables": {"background": "transparent", "lineColor": "#000000"}}}%%
flowchart LR
    A[Agent / client] --> B[AI gateway]
    B --> C[(Memory store)]
    B --> D[Model providers]
    C --> B

    classDef agent fill:#8B0000,color:#fff
    classDef hook fill:#189AB4,color:#fff
    classDef decision fill:#444,color:#fff

    class A agent
    class B hook
    class C decision
    class D hook

HTTP surface

VerbPathPurpose
POST/v1/memoryCreate an entry
GET/v1/memoryList entries visible to the caller
GET/v1/memory/{key}Fetch one key
PUT/v1/memory/{key}Upsert by key
DELETE/v1/memory/{key}Remove by key

Scoping and roles

ConcernBehaviour
Default scopeNew rows attach to the authenticated caller’s user and team context.
VisibilityRows match when the caller’s user or team aligns with the row; elevated admin roles can bypass and list broadly.
Cross-tenant writesNon-admin callers should not be able to pin memory to someone else’s user or team identifiers.
ConflictsDuplicates for the same logical key under one scope should surface as a client error rather than silent overwrite.

Payload shape

Each record is a key with a string value the model can read directly, plus optional JSON metadata for structured tags, provenance, or version hints without forcing a rigid schema up front.

Operational checklist

ItemNotes
Database migrationApply the backing table migration on a fresh or staging database before enabling in production.
Smoke testsExercise create → list → get → put → delete under both member and admin credentials.
NULL / composite keysValidate uniqueness semantics for nullable team identifiers—PostgreSQL treats NULLs distinctly inside unique indexes.
Ambiguous keysIf a principal can see multiple rows for the same key via overlapping scopes, define which row wins before relying on updates.

Treat this capability as experimental: ship behind feature flags, monitor error rates on 409/403/404, and keep a rollback path until your tenancy and uniqueness rules are proven under load.

Categories
PraisonAI

PraisonAI Managed Agents Compute Providers: Local, Docker, E2B, Modal, Daytona, Fly.io

PraisonAI Native Managed Agents run the agent loop locally, but the tool layer can be sandboxed in any compute environment — a Docker container, an E2B cloud sandbox, a Modal function, a Daytona workspace, or a Fly.io Machine — via a single compute= argument. Same API, swappable infrastructure.

Architecture

%%{init: {"theme": "base", "themeVariables": {"background": "transparent", "lineColor": "#000000"}}}%%
graph TD
    A[PraisonAI Agent] --> B[LocalManagedAgent]
    B --> C{compute=}
    C -->|local| D[Host subprocess]
    C -->|docker| E[Docker container]
    C -->|e2b| F[E2B sandbox]
    C -->|modal| G[Modal function]
    C -->|daytona| H[Daytona workspace]
    C -->|flyio| I[Fly Machine]
    D --> J[Tool execution + results]
    E --> J
    F --> J
    G --> J
    H --> J
    I --> J
    J --> A
    classDef agent fill:#8B0000,color:#fff
    classDef tool fill:#189AB4,color:#fff
    classDef decision fill:#444,color:#fff
    class A,J agent
    class D,E,F,G,H,I tool
    class B,C decision

Providers at a Glance

compute=Where tools runRequiresBest for
None / "local"Host subprocessNothingDev, quick experiments
"docker"Local Docker containerDocker running + pip install dockerReproducible, isolated
"e2b"E2B cloud sandboxE2B_API_KEY + pip install e2bFast-boot cloud sandbox
"modal"Modal serverlessModal token + pip install modalAuto-scaling, GPUs
"daytona"Daytona workspaceDAYTONA_API_KEY + default regionDev workspaces
"flyio"Fly Machine VMFLY_API_TOKEN + appEdge deployment

Install

pip install praisonai
export OPENAI_API_KEY=your-api-key  # or any other LLM provider key

Local (no infra)

No compute provider attached — tools run on the host. Ideal for quick prototyping.

"""Local provider — basic managed agent with gpt-4o-mini.

No external infrastructure needed. Runs the agent loop locally.
"""
from praisonai import Agent, ManagedAgent, LocalManagedConfig

# Create a local managed agent (auto-detects local when no ANTHROPIC_API_KEY)
managed = ManagedAgent(
    provider="local",
    config=LocalManagedConfig(
        model="gpt-4o-mini",
        system="You are a helpful assistant. Be concise.",
        name="LocalAgent",
    ),
)

agent = Agent(name="local-basic", backend=managed)

# 1. Basic execution
print("[1] Basic execution...")
result = agent.start("What is the capital of France? One word.")
print(f"    Result: {result}")

# 2. Agent metadata
print(f"\n[2] Agent ID: {managed.agent_id}")
print(f"    Version: {managed.agent_version}")
print(f"    Env ID:  {managed.environment_id}")
print(f"    Session: {managed.session_id}")

# 3. Multi-turn (same session keeps context)
print("\n[3] Multi-turn...")
result = agent.start("What country is that city in?")
print(f"    Result: {result}")

# 4. Usage tracking
info = managed.retrieve_session()
print(f"\n[4] Usage: in={info['usage']['input_tokens']}, out={info['usage']['output_tokens']}")

# 5. List sessions
sessions = managed.list_sessions()
print(f"\n[5] Sessions: {len(sessions)}")
for s in sessions:
    print(f"    {s['id']} | {s['status']}")

print("\nDone!")

Docker

Each tool call executes inside a Docker container. provision_compute() creates the container, execute_in_compute() runs commands inside it, shutdown_compute() tears it down.

"""Docker compute provider — run agent tools inside a Docker container.

Requires: Docker running locally.
"""
import asyncio
from praisonai import Agent, ManagedAgent, LocalManagedConfig

# ── 1. Basic agent with Docker compute ──
managed = ManagedAgent(
    provider="local",
    compute="docker",
    config=LocalManagedConfig(
        model="gpt-4o-mini",
        system="You are a helpful coding assistant. Be concise.",
        name="DockerAgent",
    ),
)

agent = Agent(name="docker-agent", backend=managed)

print("[1] Agent created with Docker compute")
print(f"    Agent ID: {managed.agent_id or '(lazy — created on first call)'}")
print(f"    Compute:  {managed.compute_provider.provider_name}")

# ── 2. Provision Docker container ──
print("\n[2] Provisioning Docker container...")
info = asyncio.run(managed.provision_compute(image="python:3.12-slim"))
print(f"    Instance: {info.instance_id}")
print(f"    Status:   {info.status}")

# ── 3. Execute commands inside the container ──
print("\n[3] Executing commands in Docker...")
result = asyncio.run(managed.execute_in_compute("python3 -c 'import sys; print(sys.version)'"))
print(f"    Python version: {result['stdout'].strip()}")
print(f"    Exit code: {result['exit_code']}")

result = asyncio.run(managed.execute_in_compute("echo 'Hello from Docker!'"))
print(f"    Echo: {result['stdout'].strip()}")

# ── 4. Install packages in the container ──
print("\n[4] Installing packages...")
result = asyncio.run(managed.execute_in_compute("pip install requests -q"))
print(f"    pip exit code: {result['exit_code']}")

result = asyncio.run(managed.execute_in_compute("python3 -c 'import requests; print(requests.__version__)'"))
print(f"    requests version: {result['stdout'].strip()}")

# ── 5. Use agent with LLM (runs locally, compute is for tool sandboxing) ──
print("\n[5] Agent LLM execution...")
result = agent.start("What is 15 * 23? Just the number.")
print(f"    Result: {result}")

# ── 6. Multi-turn ──
print("\n[6] Multi-turn...")
result = agent.start("Double that number.")
print(f"    Result: {result}")

# ── 7. Usage tracking ──
info = managed.retrieve_session()
print(f"\n[7] Usage: in={info['usage']['input_tokens']}, out={info['usage']['output_tokens']}")

# ── 8. Shutdown Docker container ──
print("\n[8] Shutting down Docker container...")
asyncio.run(managed.shutdown_compute())
print("    Shutdown complete.")

print("\nDone!")

E2B (cloud sandbox)

E2B provisions a secure cloud sandbox in under a second. Set E2B_API_KEY first.

"""E2B compute provider — run agent tools inside an E2B cloud sandbox.

Requires: E2B_API_KEY environment variable set.
Install:  pip install e2b
"""
import asyncio
from praisonai import Agent, ManagedAgent, LocalManagedConfig

# ── 1. Create agent with E2B compute ──
managed = ManagedAgent(
    provider="local",
    compute="e2b",
    config=LocalManagedConfig(
        model="gpt-4o-mini",
        system="You are a helpful coding assistant. Be concise.",
        name="E2BAgent",
    ),
)

agent = Agent(name="e2b-agent", backend=managed)

print("[1] Agent created with E2B compute")
print(f"    Compute: {managed.compute_provider.provider_name}")

# ── 2. Provision E2B sandbox ──
print("\n[2] Provisioning E2B sandbox...")
info = asyncio.run(managed.provision_compute(idle_timeout_s=120))
print(f"    Instance: {info.instance_id}")
print(f"    Status:   {info.status}")

# ── 3. Execute commands in the sandbox ──
print("\n[3] Executing commands in E2B...")
result = asyncio.run(managed.execute_in_compute("python3 -c 'import sys; print(sys.version)'"))
print(f"    Python version: {result['stdout'].strip()}")
print(f"    Exit code: {result['exit_code']}")

result = asyncio.run(managed.execute_in_compute("echo 'Hello from E2B!'"))
print(f"    Echo: {result['stdout'].strip()}")

# ── 4. Run a Python script ──
print("\n[4] Running Python script...")
result = asyncio.run(managed.execute_in_compute(
    "python3 -c 'for i in range(5): print(f\"Line {i}\")'"
))
print(f"    Output:\n{result['stdout']}")

# ── 5. Agent LLM execution ──
print("[5] Agent LLM execution...")
result = agent.start("What is the square root of 144? Just the number.")
print(f"    Result: {result}")

# ── 6. Multi-turn ──
print("\n[6] Multi-turn...")
result = agent.start("Now cube that number.")
print(f"    Result: {result}")

# ── 7. Usage ──
info = managed.retrieve_session()
print(f"\n[7] Usage: in={info['usage']['input_tokens']}, out={info['usage']['output_tokens']}")

# ── 8. Shutdown ──
print("\n[8] Shutting down E2B sandbox...")
asyncio.run(managed.shutdown_compute())
print("    Shutdown complete.")

print("\nDone!")

Modal

Modal gives you serverless containers with GPU support. Configure modal token new first.

"""Modal compute provider — run agent tools inside a Modal cloud sandbox.

Requires: modal CLI configured (modal token set) or MODAL_TOKEN_ID + MODAL_TOKEN_SECRET.
Install:  pip install modal
"""
import asyncio
from praisonai import Agent, ManagedAgent, LocalManagedConfig

# ── 1. Create agent with Modal compute ──
managed = ManagedAgent(
    provider="local",
    compute="modal",
    config=LocalManagedConfig(
        model="gpt-4o-mini",
        system="You are a helpful coding assistant. Be concise.",
        name="ModalAgent",
    ),
)

agent = Agent(name="modal-agent", backend=managed)

print("[1] Agent created with Modal compute")
print(f"    Compute: {managed.compute_provider.provider_name}")

# ── 2. Provision Modal sandbox ──
print("\n[2] Provisioning Modal sandbox...")
info = asyncio.run(managed.provision_compute(idle_timeout_s=120))
print(f"    Instance: {info.instance_id}")
print(f"    Status:   {info.status}")

# ── 3. Execute commands in the sandbox ──
print("\n[3] Executing commands in Modal...")
result = asyncio.run(managed.execute_in_compute("python3 -c 'import sys; print(sys.version)'"))
print(f"    Python version: {result['stdout'].strip()}")
print(f"    Exit code: {result['exit_code']}")

result = asyncio.run(managed.execute_in_compute("echo 'Hello from Modal!'"))
print(f"    Echo: {result['stdout'].strip()}")

# ── 4. Run a computation ──
print("\n[4] Running computation...")
result = asyncio.run(managed.execute_in_compute(
    "python3 -c 'print(sum(range(1, 101)))'"
))
print(f"    Sum 1..100 = {result['stdout'].strip()}")

# ── 5. Agent LLM execution ──
print("\n[5] Agent LLM execution...")
result = agent.start("What is 7 factorial? Just the number.")
print(f"    Result: {result}")

# ── 6. Multi-turn ──
print("\n[6] Multi-turn...")
result = agent.start("Is that number even or odd?")
print(f"    Result: {result}")

# ── 7. Usage ──
info = managed.retrieve_session()
print(f"\n[7] Usage: in={info['usage']['input_tokens']}, out={info['usage']['output_tokens']}")

# ── 8. Shutdown ──
print("\n[8] Shutting down Modal sandbox...")
asyncio.run(managed.shutdown_compute())
print("    Shutdown complete.")

print("\nDone!")

Daytona

Daytona provides managed developer workspaces. Set DAYTONA_API_KEY and make sure your org has a default region configured in the Daytona dashboard.

"""Daytona compute provider — run agent tools inside a Daytona cloud sandbox.

Requires: DAYTONA_API_KEY environment variable set.
          Organization must have a default region configured in Daytona Dashboard.
Install:  pip install daytona-sdk
"""
import asyncio
from praisonai import Agent, ManagedAgent, LocalManagedConfig

# ── 1. Create agent with Daytona compute ──
managed = ManagedAgent(
    provider="local",
    compute="daytona",
    config=LocalManagedConfig(
        model="gpt-4o-mini",
        system="You are a helpful coding assistant. Be concise.",
        name="DaytonaAgent",
    ),
)

agent = Agent(name="daytona-agent", backend=managed)

print("[1] Agent created with Daytona compute")
print(f"    Compute: {managed.compute_provider.provider_name}")

# ── 2. Provision Daytona sandbox ──
print("\n[2] Provisioning Daytona sandbox...")
info = asyncio.run(managed.provision_compute(idle_timeout_s=120))
print(f"    Instance: {info.instance_id}")
print(f"    Status:   {info.status}")

# ── 3. Execute commands ──
print("\n[3] Executing commands in Daytona...")
result = asyncio.run(managed.execute_in_compute("python3 -c 'import sys; print(sys.version)'"))
print(f"    Python version: {result['stdout'].strip()}")
print(f"    Exit code: {result['exit_code']}")

result = asyncio.run(managed.execute_in_compute("echo 'Hello from Daytona!'"))
print(f"    Echo: {result['stdout'].strip()}")

# ── 4. Agent LLM execution ──
print("\n[4] Agent LLM execution...")
result = agent.start("What is 11 * 13? Just the number.")
print(f"    Result: {result}")

# ── 5. Usage ──
info = managed.retrieve_session()
print(f"\n[5] Usage: in={info['usage']['input_tokens']}, out={info['usage']['output_tokens']}")

# ── 6. Shutdown ──
print("\n[6] Shutting down Daytona sandbox...")
asyncio.run(managed.shutdown_compute())
print("    Shutdown complete.")

print("\nDone!")

Fly.io

Fly.io Machines give you edge-deployed VMs. Set FLY_API_TOKEN and FLY_APP first.

"""Fly.io compute provider — run agent tools inside a Fly Machines VM.

Requires: FLY_API_TOKEN env var + an existing Fly.io app (set FLY_APP).
Install:  pip install httpx  (already a praisonai dependency)
"""
import asyncio
from praisonai import Agent, ManagedAgent, LocalManagedConfig

# ── 1. Create agent with Fly.io compute ──
managed = ManagedAgent(
    provider="local",
    compute="flyio",
    config=LocalManagedConfig(
        model="gpt-4o-mini",
        system="You are a helpful coding assistant. Be concise.",
        name="FlyioAgent",
    ),
)

agent = Agent(name="flyio-agent", backend=managed)

print("[1] Agent created with Fly.io compute")
print(f"    Compute: {managed.compute_provider.provider_name}")

# ── 2. Provision Fly Machine ──
print("\n[2] Provisioning Fly Machine...")
info = asyncio.run(managed.provision_compute(image="python:3.12-slim"))
print(f"    Instance: {info.instance_id}")
print(f"    Status:   {info.status}")

# ── 3. Execute commands in the machine ──
print("\n[3] Executing commands on Fly.io...")
result = asyncio.run(managed.execute_in_compute("python3 -c 'import sys; print(sys.version)'"))
print(f"    Python version: {result['stdout'].strip()}")
print(f"    Exit code: {result['exit_code']}")

result = asyncio.run(managed.execute_in_compute("echo 'Hello from Fly.io!'"))
print(f"    Echo: {result['stdout'].strip()}")

# ── 4. Agent LLM execution ──
print("\n[4] Agent LLM execution...")
result = agent.start("What is the population of Tokyo? One number.")
print(f"    Result: {result}")

# ── 5. Usage ──
info = managed.retrieve_session()
print(f"\n[5] Usage: in={info['usage']['input_tokens']}, out={info['usage']['output_tokens']}")

# ── 6. Shutdown ──
print("\n[6] Shutting down Fly Machine...")
asyncio.run(managed.shutdown_compute())
print("    Shutdown complete.")

print("\nDone!")

All Providers — Comparison Runner

Run every provider end-to-end in a single script and print a token-usage summary.

"""All compute providers — comprehensive test across Local, Docker, E2B, and Modal.

This example mirrors the Anthropic app.py but uses the local provider with
various compute backends instead of Anthropic's managed infrastructure.

Requires:
  - Docker running locally
  - E2B_API_KEY set
  - modal CLI configured (modal token set)
"""
import asyncio
from praisonai import Agent, ManagedAgent, LocalManagedConfig


async def test_provider(name, compute, extra_provision_kwargs=None):
    """Test a single compute provider end-to-end."""
    print(f"\n{'='*60}")
    print(f"  PROVIDER: {name}")
    print(f"{'='*60}")

    managed = ManagedAgent(
        provider="local",
        compute=compute,
        config=LocalManagedConfig(
            model="gpt-4o-mini",
            system="You are a helpful assistant. Be concise.",
            name=f"{name}Agent",
        ),
    )
    agent = Agent(name=f"{name}-test", backend=managed)

    # 1. Provision
    print(f"\n  [1] Provisioning {name}...")
    provision_kwargs = extra_provision_kwargs or {}
    info = await managed.provision_compute(**provision_kwargs)
    print(f"      Instance: {info.instance_id}")
    print(f"      Status:   {info.status}")

    # 2. Execute command
    print(f"\n  [2] Execute in {name}...")
    result = await managed.execute_in_compute("python3 -c 'print(42 * 13)'")
    stdout = result["stdout"].strip()
    assert "546" in stdout, f"Expected 546, got: {stdout}"
    print(f"      42 * 13 = {stdout} ✓")

    # 3. Execute echo
    result = await managed.execute_in_compute(f"echo 'Hello from {name}'")
    print(f"      Echo: {result['stdout'].strip()}")

    # 4. Agent LLM
    print(f"\n  [3] Agent LLM via {name}...")
    llm_result = agent.start("What is 9 * 8? Just the number.")
    print(f"      LLM result: {llm_result}")

    # 5. Multi-turn
    print("\n  [4] Multi-turn...")
    llm_result = agent.start("Add 10 to that. Just the number.")
    print(f"      Follow-up: {llm_result}")

    # 6. Usage
    session = managed.retrieve_session()
    usage = session["usage"]
    print(f"\n  [5] Usage: in={usage['input_tokens']}, out={usage['output_tokens']}")

    # 7. Shutdown
    print(f"\n  [6] Shutdown {name}...")
    await managed.shutdown_compute()
    print("      Done ✓")

    return {
        "name": name,
        "input_tokens": usage["input_tokens"],
        "output_tokens": usage["output_tokens"],
    }


async def main():
    results = []

    # Docker
    try:
        r = await test_provider("Docker", "docker", {"image": "python:3.12-slim"})
        results.append(r)
    except Exception as e:
        print(f"\n  Docker FAILED: {e}")

    # E2B
    try:
        r = await test_provider("E2B", "e2b", {"idle_timeout_s": 120})
        results.append(r)
    except Exception as e:
        print(f"\n  E2B FAILED: {e}")

    # Modal
    try:
        r = await test_provider("Modal", "modal", {"idle_timeout_s": 120})
        results.append(r)
    except Exception as e:
        print(f"\n  Modal FAILED: {e}")

    # Summary
    print(f"\n\n{'='*60}")
    print("  SUMMARY")
    print(f"{'='*60}")
    total_in = 0
    total_out = 0
    for r in results:
        total_in += r["input_tokens"]
        total_out += r["output_tokens"]
        print(f"  {r['name']:15s} | in: {r['input_tokens']:6d} | out: {r['output_tokens']:6d}")
    print(f"  {'TOTAL':15s} | in: {total_in:6d} | out: {total_out:6d}")
    print(f"{'='*60}")
    print(f"  {len(results)}/{3} providers passed")


if __name__ == "__main__":
    asyncio.run(main())

API Surface

CallPurpose
LocalManagedAgent(compute="docker")Attach a compute provider — accepts a string name or an instance
await managed.provision_compute(image=..., cpu=..., memory_mb=...)Boot the sandbox
await managed.execute_in_compute(cmd, timeout=300)Run a shell command inside the sandbox
await managed.shutdown_compute()Tear down the sandbox
managed.compute_providerThe underlying provider instance

When a compute provider is attached, the built-in execute_command, read_file, write_file, and list_files tools are automatically bridged so the LLM-driven agent loop executes them inside the sandbox instead of on the host. This gives you secure, reproducible tool execution with no change to the agent code itself.

Categories
PraisonAI

PraisonAI Native Managed Agents Full App

A single script that demonstrates the full PraisonAI Native Managed Agent lifecycle — automatic agent, environment, and session management with any LLM (OpenAI, Gemini, Ollama, local). No Anthropic key required. Covers streaming, tool selection, custom tool callbacks, web search, multi-provider switching, and usage tracking in one file.

Architecture

%%{init: {"theme": "base", "themeVariables": {"background": "transparent", "lineColor": "#000000"}}}%%
graph TD
    A[Agent + LocalManagedAgent] --> B[Auto Setup]
    B --> C[agent.start]
    C --> D[Response / Result]
    subgraph Auto Setup
      E[Create Agent] --> F[Create Environment]
      F --> G[Create Session]
    end
    B --> E
    F --> H[Any LLM via litellm]
    classDef agent fill:#8B0000,color:#fff
    classDef tool fill:#189AB4,color:#fff
    class A,D agent
    class B,C,E,F,G,H tool

How to Run

Follow these steps to run the full demo:

Step 1: Install PraisonAI

pip install praisonai

Step 2: Set your API key

export OPENAI_API_KEY=your-api-key

Prefer Gemini? export GEMINI_API_KEY=... and change provider="openai" to provider="gemini". Prefer zero cloud? Run ollama serve and switch to provider="ollama" with model="llama3.2".

Step 3: Run it

python native_full_app.py

Full Code

"""Native Managed Agents — Full App (no Anthropic required).

Demonstrates the full LocalManagedAgent lifecycle with any LLM (OpenAI/Ollama/
Gemini) via litellm. Covers:

    1. Create agent         6. Multi-turn conversation
    2. Update agent         7. Track usage
    3. Environment          8. List sessions
    4. Session              9. Select tools
    5. Stream response     10. Disable tools
                           11. Custom tools
                           12. Web search
                           13. Switch provider (Ollama)
                           14. Interrupt

Prerequisites:
    export OPENAI_API_KEY=sk-...
    pip install praisonai
"""
import json
from praisonai import Agent, LocalManagedAgent, LocalManagedConfig


# 1. Create an agent (use gpt-4o-mini for this project)
managed = LocalManagedAgent(
    provider="openai",
    config=LocalManagedConfig(model="gpt-4o-mini"),
)
agent = Agent(name="teacher", backend=managed)
result = agent.start("Say hello briefly.")
print(f"[1] Agent created: {managed.agent_id} (v{managed.agent_version})")

# 2. Update the agent
managed.update_agent(
    name="Teaching Agent v2",
    system="You are a senior Python developer. Write clean, production-quality code.",
)
print(f"[2] Agent updated to v{managed.agent_version}")

# 3-4. Environment + Session are created automatically
print(f"[3] Environment: {managed.environment_id}")
print(f"[4] Session:     {managed.session_id}")

# 5. Send a response
print("\n[5] Response...")
result = agent.start("Write a Python one-liner that prints 'Hello from Native Managed Agents!'")
print(result)

# 6. Multi-turn conversation (same session remembers context)
print("\n[6] Multi-turn follow-up...")
result = agent.start("Now modify it to accept a name argument.")
print(result)

# 7. Track usage
info = managed.retrieve_session()
print("\n[7] Usage:")
print(f"   Input tokens:  {info['usage']['input_tokens']}")
print(f"   Output tokens: {info['usage']['output_tokens']}")

# 8. List sessions
sessions = managed.list_sessions()
print(f"\n[8] Sessions: {len(sessions)}")
for s in sessions[:3]:
    print(f"   {s['id']} | {s['status']} | {s['title']}")

# 9. Selective tools (only read + write + list)
files_managed = LocalManagedAgent(
    provider="openai",
    config=LocalManagedConfig(
        name="Files Only Agent",
        model="gpt-4o-mini",
        system="You can only read/write/list files.",
        tools=["read_file", "write_file", "list_files"],
    ),
)
files_agent = Agent(name="files-only", backend=files_managed)
print("\n[9] Files-only agent...")
result = files_agent.start("List files in the current directory (ls .).")

# 10. Disable web/shell (omit them from tools)
no_web_managed = LocalManagedAgent(
    provider="openai",
    config=LocalManagedConfig(
        name="No Web Agent",
        model="gpt-4o-mini",
        system="You are a coding assistant with no web or shell access.",
        tools=["read_file", "write_file"],
    ),
)
no_web_agent = Agent(name="no-web", backend=no_web_managed)
print("\n[10] No-web agent...")
result = no_web_agent.start("Compute 2**100 mentally and return just the number.")

# 11. Custom tools
def handle_weather(tool_name, tool_input):
    print(f"   [custom tool: {tool_name} | {json.dumps(tool_input)}]")
    return "Tokyo: 22 C, sunny, humidity 55 percent"

custom_managed = LocalManagedAgent(
    provider="openai",
    config=LocalManagedConfig(
        name="Weather Agent",
        model="gpt-4o-mini",
        system="Use get_weather to answer any weather question.",
        tools=[
            {
                "type": "custom",
                "name": "get_weather",
                "description": "Get current weather for a location",
                "input_schema": {
                    "type": "object",
                    "properties": {
                        "location": {"type": "string", "description": "City name"},
                    },
                    "required": ["location"],
                },
            },
        ],
    ),
    on_custom_tool=handle_weather,
)
custom_agent = Agent(name="weather", backend=custom_managed)
print("\n[11] Custom-tool agent...")
result = custom_agent.start("What is the weather in Tokyo?")

# 12. Web search agent
search_managed = LocalManagedAgent(
    provider="openai",
    config=LocalManagedConfig(
        name="Search Agent",
        model="gpt-4o-mini",
        system="You are a research assistant. Use search_web to answer.",
        tools=["search_web"],
    ),
)
search_agent = Agent(name="searcher", backend=search_managed)
print("\n[12] Web search agent...")
result = search_agent.start("Find 3 bullet points about Python 3.13 new features.")

# 13. Switch provider — Ollama (only runs if Ollama is available)
print("\n[13] Ollama provider (skipped if Ollama not running)...")
try:
    ollama_managed = LocalManagedAgent(
        provider="ollama",
        config=LocalManagedConfig(model="llama3.2", system="Be concise."),
    )
    ollama_agent = Agent(name="local-llama", backend=ollama_managed)
    print(ollama_agent.start("Say hi in 5 words."))
except Exception as e:
    print(f"   (Ollama skipped: {e})")

# 14. Interrupt a session (best-effort for local backend)
interrupt_managed = LocalManagedAgent(
    provider="openai",
    config=LocalManagedConfig(model="gpt-4o-mini"),
)
interrupt_managed.interrupt()
print("\n[14] Interrupt signal sent.")

# Final usage summary
print("\n" + "=" * 60)
print("FINAL USAGE SUMMARY")
print("=" * 60)
all_backends = [
    ("Teaching Agent v2", managed),
    ("Files Only Agent",  files_managed),
    ("No Web Agent",      no_web_managed),
    ("Weather Agent",     custom_managed),
    ("Search Agent",      search_managed),
]
total_in = total_out = 0
for name, backend in all_backends:
    info = backend.retrieve_session()
    inp = info["usage"]["input_tokens"]
    out = info["usage"]["output_tokens"]
    total_in += inp
    total_out += out
    print(f"  {name:20s} | in: {inp:6d} | out: {out:6d}")
print(f"  {'TOTAL':20s} | in: {total_in:6d} | out: {total_out:6d}")
print("=" * 60)

What You Get

StepFeatureNotes
1Create agentZero config; UUID-based agent_id
2Update agentBumps agent_version; session unchanged
3EnvironmentLocal subprocess (or Docker/E2B/Modal via compute=)
4SessionAuto-created; persists via SessionStoreProtocol
5-6Multi-turnSame session remembers context
7Usageretrieve_session() returns input / output tokens
8List sessionsAll sessions created by this backend instance
9-10Tool gatingPass a reduced tools=[...] to restrict capabilities
11Custom toolon_custom_tool callback invoked with typed args
12Web searchBuilt-in search_web tool
13Provider swapSame API works with Ollama / Gemini / any litellm model
14InterruptBest-effort cancellation signal

Anthropic vs Native

The Anthropic version (ManagedAgent / ManagedConfig) runs on Anthropic’s managed cloud with Claude. The Native version (LocalManagedAgent / LocalManagedConfig) runs the same lifecycle on your machine with any LLM via litellm. Same API, different backend — pick whichever fits your constraints.

Categories
PraisonAI

PraisonAI Native Managed Agents: Any LLM with Simplified API and Code Examples

PraisonAI Native Managed Agents (LocalManagedAgent) give you the same auto-managed agent, environment, and session experience as Anthropic Managed Agents — but using any LLM (OpenAI, Gemini, Ollama, local) with no Anthropic dependency. Each section below is a standalone, runnable script.

Architecture

%%{init: {"theme": "base", "themeVariables": {"background": "transparent", "lineColor": "#000000"}}}%%
graph TD
    A[PraisonAI Agent] --> B[LocalManagedAgent Backend]
    B --> C[Auto-Create Agent]
    B --> D[Auto-Create Environment]
    B --> E[Auto-Create Session]
    C --> F[Any LLM via litellm]
    D --> F
    E --> F
    F --> G[Response]
    G --> H[Result]
    classDef agent fill:#8B0000,color:#fff
    classDef tool fill:#189AB4,color:#fff
    classDef decision fill:#444,color:#fff
    class A,H agent
    class B,C,D,E tool
    class F,G decision

Prerequisites

pip install praisonai

Set your LLM API key as an environment variable (OpenAI shown; Gemini or Ollama work equally well):

export OPENAI_API_KEY=your-api-key

basic

The simplest possible native managed agent — a handful of lines. No Anthropic key required.

from praisonai import Agent, LocalManagedAgent, LocalManagedConfig

managed = LocalManagedAgent(provider="openai", config=LocalManagedConfig(model="gpt-4o-mini"))
agent = Agent(name="teacher", backend=managed)
result = agent.start("Say hello from native managed agents in one sentence.")
print(result)

01_create_agent

Zero config — default name="Agent". Access agent_id and agent_version after the first call.

from praisonai import Agent, LocalManagedAgent, LocalManagedConfig

managed = LocalManagedAgent(provider="openai", config=LocalManagedConfig(model="gpt-4o-mini"))
agent = Agent(name="coder", backend=managed)
result = agent.start("Say hello")
print(f"Agent ID: {managed.agent_id}")
print(f"Version:  {managed.agent_version}")

02_create_environment

Environment is created automatically — just set model and system on LocalManagedConfig.

from praisonai import Agent, LocalManagedAgent, LocalManagedConfig

managed = LocalManagedAgent(
    provider="openai",
    config=LocalManagedConfig(
        name="Coding Assistant",
        model="gpt-4o-mini",
        system="You are a helpful coding assistant. Be concise.",
    ),
)
agent = Agent(name="coder", backend=managed)
result = agent.start("Say hello")
print(f"Agent ID:       {managed.agent_id}")
print(f"Environment ID: {managed.environment_id}")

03_create_session

Agent, environment, and session are all created automatically on first use.

from praisonai import Agent, LocalManagedAgent, LocalManagedConfig

managed = LocalManagedAgent(
    provider="openai",
    config=LocalManagedConfig(
        name="Coding Assistant",
        model="gpt-4o-mini",
        system="You are a helpful coding assistant.",
        session_title="Quickstart native session",
    ),
)
agent = Agent(name="coder", backend=managed)
result = agent.start("Say hello")
print(f"Agent ID:       {managed.agent_id}")
print(f"Environment ID: {managed.environment_id}")
print(f"Session ID:     {managed.session_id}")

04_async_execute

LocalManagedAgent satisfies the async ManagedBackendProtocol — await execute() directly from asyncio code.

import asyncio
from praisonai import LocalManagedAgent, LocalManagedConfig


async def main():
    managed = LocalManagedAgent(
        provider="openai",
        config=LocalManagedConfig(model="gpt-4o-mini", system="Be concise."),
    )
    text = await managed.execute("List 3 fun facts about octopuses.")
    print(text)


asyncio.run(main())

05_select_tools

Defaults: execute_command, read_file, write_file, list_files, search_web. Pass a reduced list to restrict capabilities.

from praisonai import Agent, LocalManagedAgent, LocalManagedConfig

managed = LocalManagedAgent(
    provider="openai",
    config=LocalManagedConfig(
        name="Read/Write Only",
        model="gpt-4o-mini",
        system="You can only read and write files.",
        tools=["read_file", "write_file", "list_files"],
    ),
)
agent = Agent(name="files-only", backend=managed)
result = agent.start("List the files in the current directory.")
print(result)

06_disable_tools

Omit tools you do not want. Missing tools are effectively disabled.

from praisonai import Agent, LocalManagedAgent, LocalManagedConfig

managed = LocalManagedAgent(
    provider="openai",
    config=LocalManagedConfig(
        name="No Web Agent",
        model="gpt-4o-mini",
        system="You are a coding assistant. You cannot access the web or shell.",
        tools=["read_file", "write_file"],
    ),
)
agent = Agent(name="no-web", backend=managed)
result = agent.start("Compute 2**100 and explain it in one sentence.")
print(result)

07_custom_tools

Define a tool schema and PraisonAI calls your on_custom_tool callback when the LLM invokes it.

import json
from praisonai import Agent, LocalManagedAgent, LocalManagedConfig


def handle_weather(tool_name, tool_input):
    print(f"  [Custom tool: {tool_name} | Input: {json.dumps(tool_input)}]")
    return "Tokyo: 22 C, sunny, humidity 55 percent"


managed = LocalManagedAgent(
    provider="openai",
    config=LocalManagedConfig(
        name="Weather Agent",
        model="gpt-4o-mini",
        system="You are a weather assistant. Use get_weather to check weather.",
        tools=[
            {
                "type": "custom",
                "name": "get_weather",
                "description": "Get current weather for a location",
                "input_schema": {
                    "type": "object",
                    "properties": {
                        "location": {"type": "string", "description": "City name"},
                    },
                    "required": ["location"],
                },
            },
        ],
    ),
    on_custom_tool=handle_weather,
)
agent = Agent(name="weather", backend=managed)
result = agent.start("What is the weather in Tokyo?")
print(result)

08_update_agent

Update the agent’s name, system prompt, or model in place. Each update bumps agent_version while keeping the session alive.

from praisonai import Agent, LocalManagedAgent, LocalManagedConfig

managed = LocalManagedAgent(
    provider="openai",
    config=LocalManagedConfig(name="v1", model="gpt-4o-mini", system="Be terse."),
)
agent = Agent(name="coder", backend=managed)
agent.start("Hi")
print(f"v{managed.agent_version}")

managed.update_agent(
    name="Senior Dev",
    system="You are a senior Python developer. Write clean, production code.",
)
print(f"v{managed.agent_version}")

result = agent.start("Write a one-liner that sums 1..10.")
print(result)

09_list_sessions

Every call to reset_session() starts a fresh session. list_sessions() returns all sessions this backend instance created.

from praisonai import Agent, LocalManagedAgent, LocalManagedConfig

managed = LocalManagedAgent(
    provider="openai",
    config=LocalManagedConfig(model="gpt-4o-mini", session_title="Session A"),
)
agent = Agent(name="coder", backend=managed)
agent.start("Say hi")

managed.reset_session()
agent.start("Say hi again")

sessions = managed.list_sessions()
print(f"Total sessions: {len(sessions)}")
for s in sessions:
    print(f"  {s['id']} | {s['status']} | {s['title']}")

10_web_search

Web search is one of the default tools — enable just search_web for a pure research agent.

from praisonai import Agent, LocalManagedAgent, LocalManagedConfig

managed = LocalManagedAgent(
    provider="openai",
    config=LocalManagedConfig(
        name="Search Agent",
        model="gpt-4o-mini",
        system="You are a research assistant. Use search_web and summarize.",
        tools=["search_web"],
    ),
)
agent = Agent(name="searcher", backend=managed)
result = agent.start("Search for 'Python 3.13 new features' and give 3 bullet points.")
print(result)

11_multi_turn

The same session remembers context across turns.

from praisonai import Agent, LocalManagedAgent, LocalManagedConfig

managed = LocalManagedAgent(
    provider="openai",
    config=LocalManagedConfig(model="gpt-4o-mini", system="Remember context across turns."),
)
agent = Agent(name="chatbot", backend=managed)

print("Turn 1:", agent.start("My favorite number is 42. Remember it."))
print("Turn 2:", agent.start("What was my favorite number?"))

12_track_usage

retrieve_session() returns a unified dict with id, status, title, and usage (input / output tokens).

from praisonai import Agent, LocalManagedAgent, LocalManagedConfig

managed = LocalManagedAgent(
    provider="openai",
    config=LocalManagedConfig(model="gpt-4o-mini", system="Be concise."),
)
agent = Agent(name="coder", backend=managed)
agent.start("Say hello.")
agent.start("Say goodbye.")

info = managed.retrieve_session()
print(f"Session:       {info['id']}")
print(f"Status:        {info['status']}")
print(f"Input tokens:  {info['usage']['input_tokens']}")
print(f"Output tokens: {info['usage']['output_tokens']}")

13_resume_session

Resume any existing session by ID. Works across process restarts because chat history is written to the default session store.

from praisonai import Agent, LocalManagedAgent, LocalManagedConfig

# First run — create and save session id
m1 = LocalManagedAgent(
    provider="openai",
    config=LocalManagedConfig(model="gpt-4o-mini", system="Remember context."),
)
a1 = Agent(name="coder", backend=m1)
a1.start("My favorite color is teal. Remember it.")
session_id = m1.session_id
print(f"Saved session: {session_id}")

# Second run — resume the same session (simulates restart)
m2 = LocalManagedAgent(
    provider="openai",
    config=LocalManagedConfig(model="gpt-4o-mini", system="Remember context."),
)
m2.resume_session(session_id)
a2 = Agent(name="coder", backend=m2)
print(a2.start("What color did I mention?"))

14_ollama

Swap provider="openai" for provider="ollama" and you get the same API running fully local — no cloud, no API key.

from praisonai import Agent, LocalManagedAgent, LocalManagedConfig

managed = LocalManagedAgent(
    provider="ollama",
    config=LocalManagedConfig(model="llama3.2", system="Be concise."),
)
agent = Agent(name="local-llama", backend=managed)
print(agent.start("Say hello in one short sentence."))

Anthropic vs Native Managed Agents

FeatureAnthropic Managed AgentsNative (LocalManagedAgent)
ProviderClaude onlyOpenAI, Gemini, Ollama, any litellm model
API keyANTHROPIC_API_KEYAny LLM provider key (or none for Ollama)
InfrastructureAnthropic cloudRuns on your machine
Auto agent / env / sessionYesYes
Built-in toolsagent_toolset_20260401execute_command, read_file, write_file, list_files, search_web
Custom toolsYesYes (on_custom_tool)
Multi-turn / resumeYesYes (SessionStoreProtocol)
Async streamingYesYes (stream())
Pip / npm packagesManaged sandboxVia Docker / E2B / Modal / Fly.io compute adapters

Use Anthropic Managed Agents when you want Claude + fully managed infrastructure. Use the Native backend when you want the same developer experience with any LLM, fully under your control.

Categories
OpenClaw

OpenClaw Slack Setup

Create Slack App

  1. Go to Slack API Console
  2. Click Create New App → From scratch
  3. Enter app name (e.g., “PraisonAI Bot”) and select workspace

Configure OAuth & Permissions

  1. Go to OAuth & Permissions in the sidebar
  2. Scroll to Scopes → Bot Token Scopes
  3. Add these scopes:
ScopePurpose
chat:writeSend messages
app_mentions:readReceive @mentions
im:historyRead DM history
im:readAccess DMs
channels:historyRead channel messages
  1. Click Install to Workspace at the top
  2. Copy the Bot User OAuth Token (xoxb-...)

Enable Socket Mode

  1. Go to Socket Mode in the sidebar
  2. Toggle Enable Socket Mode ON
  3. When prompted, create an app-level token:
    • Token Name: socket-mode
    • Add scope: connections:write
  4. Copy the App Token (xapp-...)

Subscribe to Events

This step is critical! Without event subscriptions, the bot won’t receive messages.

  1. Go to Event Subscriptions in the sidebar
  2. Toggle Enable Events ON
  3. Scroll to Subscribe to bot events
  4. Add these events:
EventPurpose
app_mentionWhen someone @mentions your bot
message.imDirect messages to your bot
  1. Click Save Changes
  2. Reinstall the app when prompted (or go to OAuth & Permissions → Reinstall)

Enable Messages Tab

  1. Go to App Home in the sidebar
  2. Scroll to Show Tabs → Messages Tab
  3. Ensure Allow users to send Slash commands and messages from the messages tab is toggled ON

Start the Bot

export SLACK_BOT_TOKEN="xoxb-..."   # Bot User OAuth Token
export SLACK_APP_TOKEN="xapp-..."   # App-Level Token