Categories
Kubernetes

Fixing Kubernetes PHP Pod CrashLoopBackOff: A Complete Guide

The Problem

Symptoms:

  • 50% of PHP pods in CrashLoopBackOff
  • Pods restarting every 2-5 minutes
  • HTTP 499 errors on health checks
  • CPU constantly at limit (1500m)

Impact:

  • Website downtime
  • Poor user experience
  • Wasted cluster resources

Root Cause Analysis

Step 1: Check Pod Status

kubectl get pods -l app=<your-app>

Output:

NAME                   READY   STATUS             RESTARTS
app-xxxxxxxxxx-xxxxx   1/2     CrashLoopBackOff   34 (81s ago)
app-xxxxxxxxxx-xxxxx   1/2     CrashLoopBackOff   43 (45s ago)
app-xxxxxxxxxx-xxxxx   1/2     CrashLoopBackOff   146 (4m46s ago)

Step 2: Check Logs

# Check PHP-FPM logs
kubectl logs <pod-name> -c php-php --tail=100

# Check Nginx logs
kubectl logs <pod-name> -c php-nginx --tail=100

Key Finding:

10.0.9.107 - - [19/Oct/2025:05:11:18 +0000] "GET /test.php HTTP/1.1" 499 0

HTTP 499 = Client closed connection before server responded

Step 3: Check Health Probes

kubectl describe pod <pod-name> | grep -A 5 "Liveness\|Readiness"

Output:

Liveness:  http-get http://:80/test.php delay=60s timeout=10s period=60s
Readiness: http-get http://:80/test.php delay=60s timeout=10s period=60s

Problem: 10-second timeout too short!

Step 4: Check PHP-FPM Configuration

kubectl exec <pod-name> -- cat /usr/local/etc/php-fpm.d/www.conf | grep -E "^pm\.|^pm ="

Output:

pm = dynamic
pm.max_children = 5        ← TOO LOW!
pm.start_servers = 2
pm.min_spare_servers = 1
pm.max_spare_servers = 3

Step 5: Check Resource Usage

kubectl top pods -l app=<your-app> | head -15

Output:

NAME                   CPU(cores)   MEMORY(bytes)
app-xxxxxxxxxx-xxxxx   1500m        587Mi      ← CPU at limit!
app-xxxxxxxxxx-xxxxx   1m           524Mi      ← Crashed pod
app-xxxxxxxxxx-xxxxx   1m           526Mi      ← Crashed pod

Step 6: Analyze Traffic

kubectl logs <pod-name> --tail=500 | grep -E "GET|POST" | grep -v "test.php" | awk '{print $5, $6, $7}' | sort | uniq -c | sort -rn

Output:

263 "POST /wp/wp-admin/admin-ajax.php" 200
221 "GET /index.php" 200

Problem: 263+ concurrent requests, but only 5 workers!


Root Cause Identified

The Problem Chain:

  1. Only 5 PHP-FPM workers available
  2. 263+ concurrent requests (WordPress admin-ajax.php is slow)
  3. All workers busy → new requests queue
  4. Health check arrives → also queues
  5. 10-second timeout expires → HTTP 499
  6. 3 consecutive failures → Kubernetes kills pod
  7. Pod restarts → CrashLoopBackOff

Memory Math:

Available Memory: 1024Mi
Base + Nginx: ~500Mi
Available for PHP-FPM: ~524Mi

Current: 5 workers × 40MB = 200MB ✓
Needed: 20 workers × 40MB = 800MB ✓ (fits!)
Too much: 100 workers × 40MB = 4000MB ✗ (OOMKill!)

The Solution

Two Critical Changes:

  1. Increase PHP-FPM workers (5 → 20)
  2. Increase health check timeout (10s → 30s)

Implementation

Step 1: Create PHP-FPM ConfigMap

File: templates/php-fpm-configmap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: php-fpm-config
  labels:
    app: php
    tier: backend
data:
  www.conf: |

[www]

user = www-data group = www-data listen = 127.0.0.1:9000 pm = dynamic pm.max_children = 20 # 5 → 20 (4x capacity) pm.start_servers = 5 # 2 → 5 pm.min_spare_servers = 3 # 1 → 3 pm.max_spare_servers = 10 # 3 → 10 pm.max_requests = 500 request_terminate_timeout = 300 pm.status_path = /fpm-status catch_workers_output = yes clear_env = no

Step 2: Update Deployment

File: templates/deployment.yaml

Add ConfigMap volume:

volumes:
  - name: php-fpm-config
    configMap:
      name: php-fpm-config

Mount in PHP container:

containers:
  - name: php-php
    volumeMounts:
      - name: php-fpm-config
        mountPath: /usr/local/etc/php-fpm.d/www.conf
        subPath: www.conf

Update health checks:

- name: php-nginx
  livenessProbe:
    timeoutSeconds: 30      # 10 → 30
    periodSeconds: 30       # 60 → 30
    httpGet:
      path: /test.php
      port: 80
  readinessProbe:
    timeoutSeconds: 30      # 10 → 30
    periodSeconds: 30       # 60 → 30
    httpGet:
      path: /test.php
      port: 80

Step 3: Deploy Changes

# Validate YAML
helm lint ./charts/<your-chart>

# Dry-run to verify
helm template <release-name> ./charts/<your-chart> --debug | grep -A 20 "php-fpm-config"

# Apply ConfigMap first (important!)
kubectl apply -f charts/<your-chart>/templates/php-fpm-configmap.yaml

# Deploy with Helm
helm upgrade <release-name> ./charts/<your-chart> --namespace <namespace>

# Monitor rollout
kubectl rollout status deployment/<deployment-name> --timeout=300s

Step 4: Verify Deployment

# Check pod status
kubectl get pods -l app=<your-app>

# Verify PHP-FPM config loaded
kubectl exec <pod-name> -c php-php -- grep max_children /usr/local/etc/php-fpm.d/www.conf

# Expected output:
# pm.max_children = 20

# Verify health check settings
kubectl describe pod <pod-name> | grep -A 3 "Liveness:"

# Expected output:
# Liveness: http-get http://:80/test.php delay=60s timeout=30s period=30s

# Check for HTTP 499 errors (should be none)
kubectl logs <pod-name> -c php-nginx --tail=50 | grep 499

# Monitor resource usage
kubectl top pods -l app=php

Results

Before vs After

MetricBeforeAfterImprovement
Pods Running10/22 (45%)22/22 (100%)✅ +120%
CrashLoopBackOff12 pods0 pods✅ Fixed
HTTP 499 ErrorsConstantNone✅ Eliminated
CPU Usage1500m (limit)347m-1202m✅ -20-80%
Memory Usage500-600Mi677-787Mi⚠️ +30% (expected)
RestartsEvery 2-5 min0-3 total✅ Stable
Worker Capacity5 workers20 workers✅ 4x increase
Health Check Timeout10s30s✅ 3x longer

Final Status:

kubectl get pods -l app=<your-app>
NAME                   READY   STATUS    RESTARTS   AGE
app-xxxxxxxxxx-xxxxx   2/2     Running   0          5m
app-xxxxxxxxxx-xxxxx   2/2     Running   0          5m
app-xxxxxxxxxx-xxxxx   2/2     Running   1          7m
app-xxxxxxxxxx-xxxxx   2/2     Running   0          7m
... (all pods healthy)

Why This Works

The Math:

Worker Capacity:

Before: 5 workers × 1 request = 5 concurrent requests
After:  20 workers × 1 request = 20 concurrent requests
Result: 4x capacity ✅

Memory Safety:

Memory Limit: 1024Mi
Base + Nginx: ~300Mi
20 workers × 40MB: ~800Mi
Total: ~1100Mi (within limit with headroom) ✅

Health Check Success:

Before: 10s timeout → fails when workers busy
After:  30s timeout → enough time to respond
Result: No false-positive failures ✅

Why Not 100 Workers?

Memory Constraint:

100 workers × 40MB = 4000MB
+ Base (300MB) + Nginx (200MB) = 4500MB
Memory Limit: 1024MB

Result: Instant OOMKill! ❌

CPU Constraint:

CPU Limit: 1500m (1.5 cores)
100 workers = 0.015 cores per worker
Result: Context switching overhead > actual work ❌

The Formula:

max_children = (Available RAM - Base Memory) / Memory per Worker

Your calculation:
max_children = (1024Mi - 300Mi) / 40Mi
max_children ≈ 18-20 workers ✅

Troubleshooting

If Pods Still Crash:

1. Check for OOMKills:

kubectl describe pod <pod-name> | grep -i oom

Solution: Reduce workers or increase memory limit

2. Check PHP-FPM status:

kubectl exec <pod-name> -c php-php -- curl -s http://localhost:9000/fpm-status

Look for:

  • active processes near max_children → increase workers
  • listen queue > 0 → workers overloaded

3. Check actual memory usage:

kubectl top pods -l app=php | sort -k3 -h

If > 900Mi consistently: Reduce workers to 15

4. Check logs for real errors:

kubectl logs <pod-name> -c php-php --previous

Optimization Tips

1. Monitor PHP-FPM Status

Expose status endpoint:

# In nginx config
location /fpm-status {
    access_log off;
    allow 127.0.0.1;
    deny all;
    fastcgi_pass 127.0.0.1:9000;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    include fastcgi_params;
}

Check status:

kubectl exec <pod> -c php-nginx -- curl -s http://localhost/fpm-status

2. Tune Based on Traffic

Low traffic (< 50 req/s):

pm.max_children = 15
pm.start_servers = 4

Medium traffic (50-150 req/s):

pm.max_children = 20
pm.start_servers = 5

High traffic (> 150 req/s):

# Increase memory limit first!
memory: 1536Mi
pm.max_children = 30
pm.start_servers = 8

3. Optimize WordPress

Disable Query Monitor in production:

// wp-config.php
define('QM_DISABLED', true);

Cache admin-ajax.php:

location = /wp-admin/admin-ajax.php {
    fastcgi_cache_valid 200 60s;
    fastcgi_cache_bypass $http_pragma $http_authorization;
}

Control WordPress Heartbeat:

// Reduce heartbeat frequency
wp.heartbeat.interval(60); // Default is 15s

Key Takeaways

✅ Do’s:

  1. Calculate workers based on memory: (RAM - Base) / 40MB
  2. Leave 20% memory headroom for traffic spikes
  3. Increase health check timeouts when workers are busy
  4. Monitor resource usage after changes
  5. Test in staging first if possible

❌ Don’ts:

  1. Don’t set workers arbitrarily (causes OOMKill)
  2. Don’t ignore memory limits (Kubernetes will kill pods)
  3. Don’t set timeout too short (false-positive failures)
  4. Don’t forget to apply ConfigMap before deployment
  5. Don’t skip verification after deployment

Commands Cheat Sheet

# Investigation
kubectl get pods -l app=<your-app>
kubectl describe pod <pod-name>
kubectl logs <pod-name> -c <container-name> --tail=100
kubectl logs <pod-name> -c <nginx-container> --tail=100
kubectl top pods -l app=<your-app>

# Check PHP-FPM config
kubectl exec <pod-name> -c <php-container> -- cat /usr/local/etc/php-fpm.d/www.conf

# Check health probes
kubectl describe pod <pod> | grep -A 5 "Liveness\|Readiness"

# Deployment
helm lint ./charts/<your-chart>
helm template <release-name> ./charts/<your-chart> --debug
kubectl apply -f charts/<your-chart>/templates/php-fpm-configmap.yaml
helm upgrade <release-name> ./charts/<your-chart> --namespace <namespace>
kubectl rollout status deployment/<deployment-name>

# Verification
kubectl get pods -l app=<your-app>
kubectl exec <pod-name> -c <php-container> -- grep max_children /usr/local/etc/php-fpm.d/www.conf
kubectl logs <pod-name> -c <nginx-container> --tail=50 | grep 499

# Monitoring
watch kubectl get pods -l app=<your-app>
kubectl top pods -l app=<your-app>
kubectl exec <pod-name> -c <php-container> -- curl -s http://localhost:9000/fpm-status

# Rollback (if needed)
helm rollback <release-name>
kubectl rollout undo deployment/<deployment-name>

Conclusion

Problem: PHP pods crashing due to worker exhaustion and short health check timeouts.

Solution: Increased PHP-FPM workers from 5 to 20 and health check timeout from 10s to 30s.

Result: All 22 pods stable, no crashes, 4x capacity increase.

Key Learning: Always calculate worker count based on available memory, not arbitrary numbers. The formula (RAM - Base) / 40MB ensures you stay within limits while maximizing capacity.


Resources


Author’s Note: This solution was implemented on a production Kubernetes cluster running a PHP application. The fix eliminated all CrashLoopBackOff issues and improved stability from 45% to 100% pod availability.

📊 Current Production Metrics Analysis

Actual Memory Usage (Right Now):

Average Memory per Pod: 665Mi
Memory Range: 544Mi - 616Mi
Memory Limit: 1024Mi (1Gi)
Utilization: 65% average (53-60% range)

Current Configuration:

Pods: 22 total
Workers per pod: 20
CPU Limit: 1500m
Average CPU Usage: 935m (62% utilization)
Restarts: 0-4 (minimal, stable)

🔍 Is 20 Workers Optimal? Let’s Calculate:

Method 1: Reverse Engineering from Memory Usage

Current Memory Usage: 665Mi average
Memory Limit: 1024Mi

Memory breakdown:
Total: 665Mi
- Base PHP + opcache: ~200Mi
- Nginx: ~100Mi
- PHP-FPM workers: 665 - 200 - 100 = 365Mi

Per worker: 366Mi / 20 workers = 18.3Mi per worker

Actual memory per worker: ~18-20MB ✅
(Lower than my 40MB estimate - your app is efficient!)

Method 2: Headroom Analysis

Current: 665Mi average
Limit: 1024Mi
Headroom: 358Mi (35%)

Safe headroom: 20-30% recommended
Your headroom: 35% ✅ OPTIMAL

Could we add more workers?
358Mi / 18Mi per worker = ~19 more workers possible
Total possible: 20 + 19 = 39 workers

BUT...

Method 3: CPU Constraint Check

Current CPU: 935m average
CPU Limit: 1500m
Utilization: 62%

With 20 workers: 935m / 20 = 46.75m per worker ✅
With 30 workers: 935m / 30 = 31.2m per worker ⚠️
With 39 workers: 935m / 39 = 24m per worker ❌ (too little)

CPU becomes the bottleneck before memory!

✅ Verdict: 20 Workers IS OPTIMAL

Why 20 is Perfect:

MetricCurrent (20 workers)If 30 workersIf 15 workers
Memory Usage665Mi (65%) ✅~850Mi (83%) ⚠️~550Mi (54%) ✓
Memory Headroom358Mi (35%) ✅~174Mi (17%) ❌~474Mi (46%) ⚠️
CPU per Worker46.75m ✅31.2m ⚠️62.3m ✓
Stability0-4 restarts ✅UnknownWould crash
Capacity20 concurrent ✅30 concurrent ⚠️15 concurrent ❌

The Sweet Spot Analysis:

✅ 20 Workers:
- Memory: 65% (safe, room for spikes)
- CPU: 62% (efficient, not throttled)
- Headroom: 35% (perfect buffer)
- Stability: Excellent (0-4 restarts)
- Handles current load perfectly

⚠️ 30 Workers (Too Aggressive):
- Memory: 83% (risky, little headroom)
- CPU: Would need more (1500m not enough)
- Risk: Traffic spikes could OOMKill
- Not recommended

❌ 15 Workers (Too Conservative):
- Memory: 54% (underutilized)
- Would crash under current 263+ concurrent requests
- Wasted resources

🎯 Final Recommendation: KEEP 20 WORKERS

Reasons:

  1. Memory Utilization: 65% – Perfect balance (not too high, not too low)
  2. CPU Utilization: 62% – Efficient, room to grow
  3. 35% Headroom – Enough buffer for traffic spikes
  4. Stable: 0-4 restarts – Proves it’s working well
  5. Actual per-worker memory: 18-20MB – More efficient than estimated

Could We Optimize Further?

Option A: Increase to 25 workers (Moderate)

yamlpm.max_children = 25
  • Memory would be: ~750Mi (73%)
  • Headroom: ~274Mi (27%)
  • Risk: Medium (less buffer for spikes)
  • Benefit: 25% more capacity
  • Verdict: Only if you see consistent high load

Option B: Keep 20 workers (Recommended)

yamlpm.max_children = 20  # Current
  • Memory: 665Mi (65%) ✅
  • Headroom: 358Mi (35%) ✅
  • Risk: Low ✅
  • Stability: Proven ✅
  • Verdict: OPTIMAL – Don’t change!

📈 When to Reconsider:

Monitor these metrics and adjust if:

bash# Check if workers are maxed out
kubectl exec <pod> -c php-php -- curl -s http://localhost:9000/fpm-status

Increase workers if:

  • active processes consistently near 20
  • listen queue > 0 frequently
  • Memory still < 75% consistently

Decrease workers if:

  • Memory > 85% consistently
  • OOMKills occur
  • active processes rarely > 10

✅ Conclusion: 20 Workers is VERIFIED OPTIMAL

Based on actual production data:

  • ✅ Memory: 665Mi / 1024Mi = 65% (perfect)
  • ✅ CPU: 935m / 1500m = 62% (efficient)
  • ✅ Headroom: 358Mi = 35% (safe buffer)
  • ✅ Stability: 0-4 restarts (excellent)
  • ✅ Handles 263+ concurrent requests
Categories
Testing

Gemini Computer Use Beginners Guide

pip install google-genai playwright
playwright install chromium
export GEMINI_API_KEY=xxxxxxx

Get API Key from: https://aistudio.google.com

from google import genai
from google.genai import types
from google.genai.types import Content, Part
from playwright.sync_api import sync_playwright
import time

# Initialize the Gemini client
client = genai.Client()

# Screen dimensions
SCREEN_WIDTH = 1440
SCREEN_HEIGHT = 900

def denormalize_x(x: int, screen_width: int) -> int:
    """Convert normalized x coordinate (0-1000) to actual pixel coordinate."""
    return int(x / 1000 * screen_width)

def denormalize_y(y: int, screen_height: int) -> int:
    """Convert normalized y coordinate (0-1000) to actual pixel coordinate."""
    return int(y / 1000 * screen_height)

def execute_function_calls(candidate, page, screen_width, screen_height):
    """Execute the actions suggested by the model."""
    results = []
    function_calls = []
    
    for part in candidate.content.parts:
        if part.function_call:
            function_calls.append(part.function_call)

    for function_call in function_calls:
        action_result = {}
        fname = function_call.name
        args = function_call.args
        print(f"  -> Executing: {fname}")

        try:
            if fname == "open_web_browser":
                pass  # Already open
            elif fname == "click_at":
                actual_x = denormalize_x(args["x"], screen_width)
                actual_y = denormalize_y(args["y"], screen_height)
                page.mouse.click(actual_x, actual_y)
            elif fname == "type_text_at":
                actual_x = denormalize_x(args["x"], screen_width)
                actual_y = denormalize_y(args["y"], screen_height)
                text = args["text"]
                press_enter = args.get("press_enter", False)

                page.mouse.click(actual_x, actual_y)
                page.keyboard.press("Meta+A")
                page.keyboard.press("Backspace")
                page.keyboard.type(text)
                if press_enter:
                    page.keyboard.press("Enter")
            
            page.wait_for_load_state(timeout=5000)
            time.sleep(1)

        except Exception as e:
            print(f"Error executing {fname}: {e}")
            action_result = {"error": str(e)}

        results.append((fname, action_result))

    return results

def get_function_responses(page, results):
    """Capture screenshot and URL after actions."""
    screenshot_bytes = page.screenshot(type="png")
    current_url = page.url
    function_responses = []
    
    for name, result in results:
        response_data = {"url": current_url}
        response_data.update(result)
        function_responses.append(
            types.FunctionResponse(
                name=name,
                response=response_data,
                parts=[types.FunctionResponsePart(
                    inline_data=types.FunctionResponseBlob(
                        mime_type="image/png",
                        data=screenshot_bytes))
                ]
            )
        )
    return function_responses

# Main program
print("Initialising browser...")
playwright = sync_playwright().start()
browser = playwright.chromium.launch(headless=False)
context = browser.new_context(viewport={"width": SCREEN_WIDTH, "height": SCREEN_HEIGHT})
page = context.new_page()

try:
    # Go to initial page
    page.goto("https://tinyurl.com/pet-care-signup")
    
    # Configure the model with Computer Use tool
    config = types.GenerateContentConfig(
        tools=[types.Tool(computer_use=types.ComputerUse(
            environment=types.Environment.ENVIRONMENT_BROWSER
        ))],
    )

    # Take initial screenshot
    initial_screenshot = page.screenshot(type="png")
    USER_PROMPT = """
    From https://tinyurl.com/pet-care-signup, 
    get all details for any pet with a California residency. 
    Output all the information you find in a clear, readable format.
    """
    print(f"Goal: {USER_PROMPT}")

    contents = [
        Content(role="user", parts=[
            Part(text=USER_PROMPT),
            Part.from_bytes(data=initial_screenshot, mime_type='image/png')
        ])
    ]

    # Agent loop - maximum 5 turns
    for i in range(5):
        print(f"\n--- Turn {i+1} ---")
        print("Thinking...")
        
        response = client.models.generate_content(
            model='gemini-2.5-computer-use-preview-10-2025',
            contents=contents,
            config=config,
        )

        candidate = response.candidates[0]
        contents.append(candidate.content)

        # Check if there are function calls to execute
        has_function_calls = any(part.function_call for part in candidate.content.parts)
        if not has_function_calls:
            text_response = " ".join([part.text for part in candidate.content.parts if part.text])
            print("Agent finished:", text_response)
            break

        print("Executing actions...")
        results = execute_function_calls(candidate, page, SCREEN_WIDTH, SCREEN_HEIGHT)

        print("Capturing state...")
        function_responses = get_function_responses(page, results)

        contents.append(
            Content(role="user", parts=[Part(function_response=fr) for fr in function_responses])
        )

finally:
    print("\nClosing browser...")
    browser.close()
    playwright.stop()
    print("Done!")

from google import genai
from google.genai import types
from google.genai.types import Content, Part
from playwright.sync_api import sync_playwright
import time

# Initialize the Gemini client
client = genai.Client()

# Screen dimensions
SCREEN_WIDTH = 1440
SCREEN_HEIGHT = 900

def denormalize_x(x: int, screen_width: int) -> int:
    """Convert normalized x coordinate (0-1000) to actual pixel coordinate."""
    return int(x / 1000 * screen_width)

def denormalize_y(y: int, screen_height: int) -> int:
    """Convert normalized y coordinate (0-1000) to actual pixel coordinate."""
    return int(y / 1000 * screen_height)

def execute_function_calls(candidate, page, screen_width, screen_height):
    """Execute the actions suggested by the model."""
    results = []
    function_calls = []
    
    for part in candidate.content.parts:
        if part.function_call:
            function_calls.append(part.function_call)

    for function_call in function_calls:
        action_result = {}
        fname = function_call.name
        args = function_call.args
        print(f"  -> Executing: {fname}")

        try:
            if fname == "open_web_browser":
                pass  # Already open
            elif fname == "click_at":
                actual_x = denormalize_x(args["x"], screen_width)
                actual_y = denormalize_y(args["y"], screen_height)
                page.mouse.click(actual_x, actual_y)
            elif fname == "type_text_at":
                actual_x = denormalize_x(args["x"], screen_width)
                actual_y = denormalize_y(args["y"], screen_height)
                text = args["text"]
                press_enter = args.get("press_enter", False)

                page.mouse.click(actual_x, actual_y)
                page.keyboard.press("Meta+A")
                page.keyboard.press("Backspace")
                page.keyboard.type(text)
                if press_enter:
                    page.keyboard.press("Enter")
            elif fname == "drag_and_drop":
                start_x = denormalize_x(args["x"], screen_width)
                start_y = denormalize_y(args["y"], screen_height)
                dest_x = denormalize_x(args["destination_x"], screen_width)
                dest_y = denormalize_y(args["destination_y"], screen_height)
                
                # Perform drag and drop
                page.mouse.move(start_x, start_y)
                page.mouse.down()
                page.mouse.move(dest_x, dest_y)
                page.mouse.up()
            
            page.wait_for_load_state(timeout=5000)
            time.sleep(1)

        except Exception as e:
            print(f"Error executing {fname}: {e}")
            action_result = {"error": str(e)}

        results.append((fname, action_result))

    return results

def get_function_responses(page, results):
    """Capture screenshot and URL after actions."""
    screenshot_bytes = page.screenshot(type="png")
    current_url = page.url
    function_responses = []
    
    for name, result in results:
        response_data = {"url": current_url}
        response_data.update(result)
        function_responses.append(
            types.FunctionResponse(
                name=name,
                response=response_data,
                parts=[types.FunctionResponsePart(
                    inline_data=types.FunctionResponseBlob(
                        mime_type="image/png",
                        data=screenshot_bytes))
                ]
            )
        )
    return function_responses

# Main program
print("Initialising browser...")
playwright = sync_playwright().start()
browser = playwright.chromium.launch(headless=False)
context = browser.new_context(viewport={"width": SCREEN_WIDTH, "height": SCREEN_HEIGHT})
page = context.new_page()

try:
    # Go to initial page
    page.goto("https://sticky-note-jam.web.app")
    
    # Configure the model with Computer Use tool
    config = types.GenerateContentConfig(
        tools=[types.Tool(computer_use=types.ComputerUse(
            environment=types.Environment.ENVIRONMENT_BROWSER
        ))],
    )

    # Take initial screenshot
    initial_screenshot = page.screenshot(type="png")
    USER_PROMPT = """
    My art club brainstormed tasks ahead of our fair. 
    The board is chaotic and I need your help organising the tasks into some categories I created. 
    Go to sticky-note-jam.web.app and 
    ensure notes are clearly in the right sections. 
    Drag them there if not. 
    In your output, describe what the initial stage looked like and 
    what the final stage looks like after organisation.
    """
    print(f"Goal: {USER_PROMPT}")

    contents = [
        Content(role="user", parts=[
            Part(text=USER_PROMPT),
            Part.from_bytes(data=initial_screenshot, mime_type='image/png')
        ])
    ]

    # Agent loop - maximum 10 turns (more turns for drag operations)
    for i in range(10):
        print(f"\n--- Turn {i+1} ---")
        print("Thinking...")
        
        response = client.models.generate_content(
            model='gemini-2.5-computer-use-preview-10-2025',
            contents=contents,
            config=config,
        )

        candidate = response.candidates[0]
        contents.append(candidate.content)

        # Check if there are function calls to execute
        has_function_calls = any(part.function_call for part in candidate.content.parts)
        if not has_function_calls:
            text_response = " ".join([part.text for part in candidate.content.parts if part.text])
            print("Agent finished:", text_response)
            break

        print("Executing actions...")
        results = execute_function_calls(candidate, page, SCREEN_WIDTH, SCREEN_HEIGHT)

        print("Capturing state...")
        function_responses = get_function_responses(page, results)

        contents.append(
            Content(role="user", parts=[Part(function_response=fr) for fr in function_responses])
        )

finally:
    print("\nClosing browser...")
    browser.close()
    playwright.stop()
    print("Done!")

Categories
AI Agents

Claude Agent SDK Beginners Tutorial

Installation Commands:

# Install Claude Code
npm install -g @anthropic-ai/claude-code

# Install Claude Agents SDK
pip install claude-agent-sdk

# Set API Key
export ANTHROPIC_API_KEY=your_api_key_here

Basic

import asyncio
from claude_agent_sdk import query

async def main():
    async for message in query(prompt="Hello, how are you?"):
        print(message)
        
asyncio.run(main())

Inbuild Tools

import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions
from rich import print

async def main():
    
    options = ClaudeAgentOptions(
        allowed_tools=["Read", "Write"],
        permission_mode="acceptEdits"
    )
    
    async for msg in query(
        prompt="Create a file called greeting.txt with 'Hello Mervin Praison!'",
        options=options
    ):
        print(msg)

asyncio.run(main())

Custom Tools

import asyncio
from typing import Any
from claude_agent_sdk import ClaudeSDKClient, ClaudeAgentOptions, tool, create_sdk_mcp_server
from rich import print

@tool("greet", "Greet a user", {"name": str})
async def greet(args: dict[str, Any]) -> dict[str, Any]:
    return {
        "content": [{
            "type": "text",
            "text": f"Hello, {args['name']}!"
        }]
    }

server = create_sdk_mcp_server(
    name="my-tools",
    version="1.0.0",
    tools=[greet]
)

async def main():
    options = ClaudeAgentOptions(
        mcp_servers={"tools": server},
        allowed_tools=["mcp__tools__greet"]
    )

    async with ClaudeSDKClient(options=options) as client:
        await client.query("Greet Mervin Praison")
        async for msg in client.receive_response():
            print(msg)

asyncio.run(main())

Claude Agent Options

import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions
from rich import print
async def main():
    options = ClaudeAgentOptions(
        system_prompt="You are an expert Python developer",
        permission_mode='acceptEdits',
        cwd="/Users/praison/cc"
    )

    async for message in query(
        prompt="Create a Python web server in my current directory",
        options=options
    ):
        print(message)

asyncio.run(main())
Categories
Application

Gemini Image Editing Code

export GOOGLE_API_KEY=xxxxxxx
pip install google-genai gradio

App

from google import genai
from PIL import Image
from io import BytesIO

client = genai.Client()

prompt = "Add a Cap to the person's head"

image = Image.open('mervinpraison.jpeg')

response = client.models.generate_content(
    model="gemini-2.5-flash-image-preview",
    contents=[prompt, image],
)

for part in response.candidates[0].content.parts:
    if part.text is not None:
        print(part.text)
    elif part.inline_data is not None:
        image = Image.open(BytesIO(part.inline_data.data))   
        image.save("generated_image.png")
        print("Generated image saved as 'generated_image.png'")

UI.py

import gradio as gr
from google import genai
from PIL import Image
from io import BytesIO

client = genai.Client()

def edit_image(image, prompt):
    response = client.models.generate_content(
        model="gemini-2.5-flash-image-preview",
        contents=[prompt, image],
    )
    
    if response.candidates[0].finish_reason.name == 'PROHIBITED_CONTENT':
        return None, "Content blocked by safety filters"
    elif response.candidates[0].content is None:
        return None, f"No content generated: {response.candidates[0].finish_reason.name}"
    
    for part in response.candidates[0].content.parts:
        if part.inline_data is not None:
            return Image.open(BytesIO(part.inline_data.data)), "Image generated successfully"
    
    return None, "No image found in response"

iface = gr.Interface(
    fn=edit_image,
    inputs=[
        gr.Image(type="pil", label="Upload Image"),
        gr.Textbox(label="Edit Prompt", value="Add a Cap to the person's head")
    ],
    outputs=[
        gr.Image(label="Edited Image"),
        gr.Textbox(label="Status")
    ],
    title="Image Editor"
)

iface.launch()
Categories
Prompt Engineering

Microsoft POML Beginners Tutorial

from poml import poml
import requests, json
from rich import print

# 1) Load and render POML file
messages = poml("financial_analysis.poml", chat=True)

# 2) Combine messages into a single prompt
full_prompt = "\n".join(
    ["\n".join(str(c).strip() for c in m["content"]) if isinstance(m.get("content"), list) else str(m["content"]).strip()
     for m in messages if m.get("content")]
)

print("\n--- Full Prompt ---\n")
print(full_prompt)

# 3) Call Ollama Model 
resp = requests.post(
    "http://localhost:11434/api/generate",
    json={"model": "qwen2.5vl:latest", "prompt": full_prompt, "stream": False},
)
data = resp.json()

print("\n--- Model Response ---\n")
print(data.get("response") or json.dumps(data, indent=2))
Categories
OpenAI

GPT-OSS Finetuning

from transformers import pipeline

REASONING_LANGUAGE = "Tamil"  # or German, or any other language...
SYSTEM_PROMPT = f"reasoning language: {REASONING_LANGUAGE}"
USER_PROMPT = "What is the national symbol of Canada?"

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": USER_PROMPT},
]

generator = pipeline("text-generation", model="mervinpraison/gpt-oss-20b-multilingual-reasoner", device="cuda")
output = generator(messages, max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])
Categories
OpenAI

GPT OSS Code Beginners

export GROQ_API_KEY=xxxxxxx
pip install groq gradio
from groq import Groq

client = Groq()
completion = client.chat.completions.create(
    model="openai/gpt-oss-120b",
    messages=[
      {
        "role": "user",
        "content": "Write Essay about AI in 1000 words"
      }
    ],
    temperature=1,
    max_completion_tokens=32000,
    top_p=1,
    reasoning_effort="medium",
    stream=True,
    stop=None
)

for chunk in completion:
    print(chunk.choices[0].delta.content or "", end="")
import gradio as gr
from groq import Groq

def generate(prompt, temperature=1, max_tokens=32000):
    """Generate essay using Groq API"""
    client = Groq()
    completion = client.chat.completions.create(
        model="openai/gpt-oss-120b",
        messages=[
            {
                "role": "user",
                "content": prompt
            }
        ],
        temperature=temperature,
        max_completion_tokens=max_tokens,
        top_p=1,
        reasoning_effort="medium",
        stream=True,
        stop=None
    )
    
    response = ""
    for chunk in completion:
        content = chunk.choices[0].delta.content or ""
        response += content
        yield response

# Gradio interface
with gr.Blocks(title="AI Essay Generator") as demo:
    gr.Markdown("# AI Essay Generator")
    gr.Markdown("Generate essays using Groq's GPT-OSS-120B model")
    
    with gr.Row():
        with gr.Column():
            prompt_input = gr.Textbox(
                label="Prompt",
                placeholder="Write Essay about AI in 1000 words",
                value="Write Essay about AI in 1000 words",
                lines=3
            )
            
            with gr.Row():
                temperature_slider = gr.Slider(
                    minimum=0.1,
                    maximum=2.0,
                    value=1.0,
                    step=0.1,
                    label="Temperature"
                )
                max_tokens_slider = gr.Slider(
                    minimum=1000,
                    maximum=32000,
                    value=32000,
                    step=1000,
                    label="Max Tokens"
                )
            
            generate_btn = gr.Button("Generate", variant="primary")
        
        with gr.Column():
            output_text = gr.Textbox(
                label="Generated Essay",
                lines=20,
                interactive=False
            )
    
    generate_btn.click(
        fn=generate,
        inputs=[prompt_input, temperature_slider, max_tokens_slider],
        outputs=output_text
    )

if __name__ == "__main__":
    demo.launch()
Categories
RAG

Google LangExtract Beginners

pip install langextract
brew install libmagic
export LANGEXTRACT_API_KEY=xxxxxxxxx

Basic

import langextract as lx
import textwrap

# 1. Define the prompt and extraction rules
prompt = textwrap.dedent("""\
    Extract characters, emotions, and relationships in order of appearance.
    Use exact text for extractions. Do not paraphrase or overlap entities.
    Provide meaningful attributes for each entity to add context.""")

# 2. Provide a high-quality example to guide the model
examples = [
    lx.data.ExampleData(
        text="ROMEO. But soft! What light through yonder window breaks? It is the east, and Juliet is the sun.",
        extractions=[
            lx.data.Extraction(
                extraction_class="character",
                extraction_text="ROMEO",
                attributes={"emotional_state": "wonder"}
            ),
            lx.data.Extraction(
                extraction_class="emotion",
                extraction_text="But soft!",
                attributes={"feeling": "gentle awe"}
            ),
            lx.data.Extraction(
                extraction_class="relationship",
                extraction_text="Juliet is the sun",
                attributes={"type": "metaphor"}
            ),
        ]
    )
]

# The input text to be processed
input_text = "Lady Juliet gazed longingly at the stars, her heart aching for Romeo"

# Run the extraction
result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
)

# Save the results to a JSONL file
lx.io.save_annotated_documents([result], output_name="extraction_results.jsonl")

# Generate the visualization from the file
html_content = lx.visualize("test_output/extraction_results.jsonl")
with open("test_output/visualization.html", "w") as f:
    f.write(html_content)

Advanced

import langextract as lx
import textwrap
from collections import Counter, defaultdict

# Define comprehensive prompt and examples for complex literary text
prompt = textwrap.dedent("""\
    Extract characters, emotions, and relationships from the given text.

    Provide meaningful attributes for every entity to add context and depth.

    Important: Use exact text from the input for extraction_text. Do not paraphrase.
    Extract entities in order of appearance with no overlapping text spans.

    Note: In play scripts, speaker names appear in ALL-CAPS followed by a period.""")

examples = [
    lx.data.ExampleData(
        text=textwrap.dedent("""\
            ROMEO. But soft! What light through yonder window breaks?
            It is the east, and Juliet is the sun.
            JULIET. O Romeo, Romeo! Wherefore art thou Romeo?"""),
        extractions=[
            lx.data.Extraction(
                extraction_class="character",
                extraction_text="ROMEO",
                attributes={"emotional_state": "wonder"}
            ),
            lx.data.Extraction(
                extraction_class="emotion",
                extraction_text="But soft!",
                attributes={"feeling": "gentle awe", "character": "Romeo"}
            ),
            lx.data.Extraction(
                extraction_class="relationship",
                extraction_text="Juliet is the sun",
                attributes={"type": "metaphor", "character_1": "Romeo", "character_2": "Juliet"}
            ),
            lx.data.Extraction(
                extraction_class="character",
                extraction_text="JULIET",
                attributes={"emotional_state": "yearning"}
            ),
            lx.data.Extraction(
                extraction_class="emotion",
                extraction_text="Wherefore art thou Romeo?",
                attributes={"feeling": "longing question", "character": "Juliet"}
            ),
        ]
    )
]

# Process Romeo & Juliet directly from Project Gutenberg
print("Downloading and processing Romeo and Juliet from Project Gutenberg...")

result = lx.extract(
    text_or_documents="https://www.gutenberg.org/files/1513/1513-0.txt",
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    extraction_passes=3,      # Multiple passes for improved recall
    max_workers=20,           # Parallel processing for speed
    max_char_buffer=1000      # Smaller contexts for better accuracy
)

print(f"Extracted {len(result.extractions)} entities from {len(result.text):,} characters")

# Save and visualize the results
lx.io.save_annotated_documents([result], output_name="romeo_juliet_extractions.jsonl")

# Generate the interactive visualization
html_content = lx.visualize("test_output/romeo_juliet_extractions.jsonl")
with open("test_output/romeo_juliet_visualization.html", "w") as f:
    f.write(html_content)

print("Interactive visualization saved to romeo_juliet_visualization.html")

# Analyze character mentions
characters = {}
for e in result.extractions:
    if e.extraction_class == "character":
        char_name = e.extraction_text
        if char_name not in characters:
            characters[char_name] = {"count": 0, "attributes": set()}
        characters[char_name]["count"] += 1
        if e.attributes:
            for attr_key, attr_val in e.attributes.items():
                characters[char_name]["attributes"].add(f"{attr_key}: {attr_val}")

# Print character summary
print(f"\nCHARACTER SUMMARY ({len(characters)} unique characters)")
print("=" * 60)

sorted_chars = sorted(characters.items(), key=lambda x: x[1]["count"], reverse=True)
for char_name, char_data in sorted_chars[:10]:  # Top 10 characters
    attrs_preview = list(char_data["attributes"])[:3]
    attrs_str = f" ({', '.join(attrs_preview)})" if attrs_preview else ""
    print(f"{char_name}: {char_data['count']} mentions{attrs_str}")

# Entity type breakdown
entity_counts = Counter(e.extraction_class for e in result.extractions)
print(f"\nENTITY TYPE BREAKDOWN")
print("=" * 60)
for entity_type, count in entity_counts.most_common():
    percentage = (count / len(result.extractions)) * 100
    print(f"{entity_type}: {count} ({percentage:.1f}%)")
Categories
Praison AI

Context Agent

pip install "praisonai[mongodb]"
export OPENAI_API_KEY=
export GITHUB_TOKEN=xxxxxxxx (Optional: if getting data from github repo)
from praisonaiagents import ContextAgent

agent = ContextAgent(llm="gpt-4o-mini", auto_analyze=False)

agent.start("https://github.com/MervinPraison/PraisonAI/ Need to add Authentication")

Knowledge

import os
from praisonaiagents import Agent, Task, PraisonAIAgents

# Ensure OpenAI API key is set
if not os.environ.get("OPENAI_API_KEY"):
    raise ValueError("Please set the OPENAI_API_KEY environment variable")

def main():
    # MongoDB knowledge configuration
    mongodb_knowledge_config = {
        "vector_store": {
            "provider": "mongodb",
            "config": {
                "connection_string": "mongodb+srv://Username:Password@cluster2.bofm7.mywebsite.net/?retryWrites=true&w=majority&appName=Cluster2",  # Replace with your MongoDB connection string
                "database": "praisonai_knowledge",
                "collection": "knowledge_base",
                "use_vector_search": True  # Enable Atlas Vector Search
            }
        },
        "embedder": {
            "provider": "openai",
            "config": {
                "model": "text-embedding-3-small",
                "api_key": os.getenv("OPENAI_API_KEY")
            }
        }
    }
    
    # Create a knowledge agent with MongoDB knowledge store
    knowledge_agent = Agent(
        name="MongoDB Knowledge Agent",
        role="Knowledge Specialist",
        goal="Provide accurate information from MongoDB knowledge base",
        backstory="""You are an expert knowledge specialist who can access and 
        retrieve information from a comprehensive MongoDB knowledge base. You excel 
        at finding relevant information, synthesizing knowledge from multiple sources, 
        and providing accurate, context-aware responses.""",
        knowledge_config=mongodb_knowledge_config,
        knowledge=[os.path.join(os.path.dirname(__file__), "llms.md")],
        memory=True,
        verbose=True,
        llm="gpt-4o-mini"
    )
    
    # Create a research assistant agent
    research_agent = Agent(
        name="Research Assistant",
        role="Research Assistant",
        goal="Gather information and store it in the knowledge base",
        backstory="""You are a research assistant who specializes in gathering 
        information from various sources and organizing it for storage in the 
        knowledge base. You ensure information is accurate, well-structured, 
        and properly categorized.""",
        memory=True,
        verbose=True,
        llm="gpt-4o-mini"
    )
    
    # Create tasks for knowledge management
    knowledge_tasks = [
        Task(
            description="""Research and store information about MongoDB Atlas Vector Search:
            1. Gather comprehensive information about MongoDB Atlas Vector Search
            2. Include technical specifications, use cases, and best practices
            3. Store the information in the MongoDB knowledge base
            4. Organize information by categories (features, performance, integration)
            """,
            expected_output="MongoDB Atlas Vector Search information stored in knowledge base",
            agent=research_agent
        ),
        Task(
            description="""Research and store information about AI agent frameworks:
            1. Research popular AI agent frameworks (LangChain, AutoGen, etc.)
            2. Compare their features, capabilities, and use cases
            3. Store comparative analysis in the knowledge base
            4. Include code examples and best practices
            """,
            expected_output="AI agent framework comparison stored in knowledge base",
            agent=research_agent
        ),
        Task(
            description="""Query the knowledge base for MongoDB information:
            1. Search for information about MongoDB Atlas Vector Search
            2. Extract key features and capabilities
            3. Provide a comprehensive summary
            4. Include technical recommendations
            """,
            expected_output="Comprehensive MongoDB Atlas Vector Search summary from knowledge base",
            agent=knowledge_agent
        ),
        Task(
            description="""Query the knowledge base for AI agent framework information:
            1. Search for information about AI agent frameworks
            2. Compare different frameworks based on stored knowledge
            3. Provide recommendations for different use cases
            4. Include best practices and examples
            """,
            expected_output="AI agent framework comparison and recommendations from knowledge base",
            agent=knowledge_agent
        )
    ]
    
    # Initialize the multi-agent system with MongoDB knowledge
    print("🚀 Starting MongoDB Knowledge Management System...")
    print("=" * 60)
    
    knowledge_system = PraisonAIAgents(
        agents=[research_agent, knowledge_agent],
        tasks=knowledge_tasks,
        memory=True,
        verbose=True
    )
    
    # Execute the knowledge management pipeline
    results = knowledge_system.start()
    
if __name__ == "__main__":
    main()
from praisonaiagents import Agent, Task, PraisonAIAgents
from praisonaiagents.memory import Memory
from praisonaiagents.agent import ContextAgent
import pymongo

context_agent = ContextAgent(llm="gpt-4o-mini", auto_analyze=False)

context_output = context_agent.start("https://github.com/MervinPraison/PraisonAI/ Need to add Authentication")

mongodb_memory_config = {
    "provider": "mongodb",
    "config": {
        "connection_string": "mongodb+srv://Username:Password@cluster2.bofm7.mywebsite.net/?retryWrites=true&w=majority&appName=Cluster2",
        "database": "praisonai_memory",
        "use_vector_search": True,
        "max_pool_size": 50,
        "min_pool_size": 10,
        "server_selection_timeout": 5000
    }
}

implementation_agent = Agent(
    name="Implementation Agent",
    role="Authentication Implementation Specialist",
    goal="Implement authentication features based on project requirements",
    backstory="Expert software implementer specializing in authentication systems, security features, and seamless integration with existing codebases",
    memory=True,
    llm="gpt-4o-mini",
)

implementation_task = Task(
    description="Implement authentication features based on the project requirements from context analysis",
    expected_output="Authentication implementation with code, configuration, and integration details",
    agent=implementation_agent,
    context=context_output,
)

implementation_system = PraisonAIAgents(
    agents=[implementation_agent],
    tasks=[implementation_task],
    memory=True,
    memory_config=mongodb_memory_config
)

results = implementation_system.start()
print(f"Results: {results}")
Categories
Langchain Llama Index RAG

Langtrace

export LANGTRACE_API_KEY=xxxxx
export OPENAI_API_KEY=xxxxxx

Langtrace Llama Index

pip install langtrace-python-sdk llama-index openai langchain_community langchain langchain-chroma langchainhub langchain_openai
from langtrace_python_sdk import langtrace # Must precede any llm module imports
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
import os

langtrace.init(
    api_key=os.getenv('LANGTRACE_API_KEY', os.environ.get('LANGTRACE_API_KEY')),
    api_host="http://localhost:3000/api/trace"
)

documents = SimpleDirectoryReader(input_files=["soccer_rules.pdf"]).load_data()
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()
print(query_engine.query("What is a throw in?").response)

Langtrace Langchain

from langtrace_python_sdk import langtrace # Must precede any llm module imports
import os
from langchain import hub
from langchain_chroma import Chroma
from langchain_community.document_loaders import TextLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import ChatOpenAI

langtrace.init(
    api_key=os.getenv('LANGTRACE_API_KEY', os.environ.get('LANGTRACE_API_KEY')),
    api_host="http://localhost:3000/api/trace"
)

llm = ChatOpenAI(model="gpt-4o-mini")
loader = TextLoader("soccer_rules.txt")
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

# Retrieve and generate using the relevant snippets of the blog.
retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt")

retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

print(rag_chain.invoke("What is Offside??"))

Langtrace CrewAI

https://github.com/Scale3-Labs/langtrace-recipes/blob/main/integrations/llm-framework/crewai/starter.ipynb