Mervin Praison

Ollama

Ollama Llama 3.2 Vision

Post author By praison
Post date November 9, 2024

ollama pull llama3.2-vision
pip install ollama chainlit

Code

import ollama

with open('image.jpeg', 'rb') as file:
  response = ollama.chat(
    model='llama3.2-vision',
    messages=[
      {
        'role': 'user',
        'content': 'What is in this image?',
        'images': [file.read()],
      },
    ],
  )

print(response['message']['content'])

UI

import chainlit as cl
import ollama

@cl.on_chat_start
async def start():
    # Send initial message
    await cl.Message(
        content="Welcome! Please send an image and I'll analyze it for you."
    ).send()

@cl.on_message
async def main(message: cl.Message):
    # Get image elements from the message
    image_elements = message.elements
    
    if not image_elements:
        await cl.Message(
            content="Please provide an image to analyze."
        ).send()
        return
        
    # Process each image element
    for image in image_elements:
        try:
            # Get image content
            if image.path:
                with open(image.path, 'rb') as file:
                    image_data = file.read()
            else:
                image_data = image.content
                
            # Send image to Ollama for analysis
            response = ollama.chat(
                model='llama3.2-vision',  # Update model name as needed
                messages=[
                    {
                        'role': 'user',
                        'content': 'What is in this image?',
                        'images': [image_data],
                    },
                ],
            )
            
            # Display the image and analysis
            await cl.Message(
                content=response['message']['content'],
                elements=[
                    cl.Image(
                        name="Analyzed Image",
                        content=image_data,
                        display="inline"
                    )
                ]
            ).send()
            
        except Exception as e:
            await cl.Message(
                content=f"Error processing image: {str(e)}"
            ).send()

AI Agents

Bolt.new Ollama

Post author By praison
Post date November 8, 2024

git clone https://github.com/coleam00/bolt.new-any-llm
cd bolt.new-any-llm
ollama pull qwen2.5-coder
ollama create -f Modelfile qwen2.5-large:7b
docker-compose --profile development up

Modelfile

FROM qwen2.5-coder:7b
PARAMETER num_ctx 32768

Tools

Gradio Automatic Speech Recognition

Post author By praison
Post date October 31, 2024

pip install torch transformers torchaudio gradio

import gradio as gr
from transformers import pipeline
import numpy as np

transcriber = pipeline("automatic-speech-recognition", model="openai/whisper-base.en")

def transcribe(audio):
    # Check if audio input exists
    if audio is None:
        return "Please record some audio"
        
    sr, y = audio
    
    # Convert to mono if stereo
    if y.ndim > 1:
        y = y.mean(axis=1)
        
    y = y.astype(np.float32)
    y /= np.max(np.abs(y))

    return transcriber({"sampling_rate": sr, "raw": y})["text"]  

demo = gr.Interface(
    transcribe,
    gr.Audio(sources="microphone", type="numpy"),  # Specify type as numpy
    "text",
)

demo.launch()

Realtime Streaming (not working)

import gradio as gr
from transformers import pipeline
import numpy as np

transcriber = pipeline("automatic-speech-recognition", model="openai/whisper-base.en")

def transcribe(stream, new_chunk):
    if new_chunk is None:
        return None, ""
        
    sr, y = new_chunk
    
    # Convert to mono if stereo
    if y.ndim > 1:
        y = y.mean(axis=1)
        
    y = y.astype(np.float32)
    y /= np.max(np.abs(y)) + 1e-7  # Safe normalization

    # Initialize or update stream
    if stream is not None:
        stream = np.concatenate([stream, y])
    else:
        stream = y
        
    # Keep only last 5 seconds to prevent memory issues
    max_length = sr * 5
    if len(stream) > max_length:
        stream = stream[-max_length:]
        
    try:
        text = transcriber({"sampling_rate": sr, "raw": stream})["text"]
        return stream, text
    except Exception as e:
        print(f"Transcription error: {e}")
        return stream, ""

demo = gr.Interface(
    fn=transcribe,
    inputs=[
        "state",
        gr.Audio(sources=["microphone"], streaming=True, type="numpy")
    ],
    outputs=[
        "state",
        "text"
    ],
    live=True
)

if __name__ == "__main__":
    demo.queue().launch()

chainlit run ui.py

Tools

Omni Parser

Post author By praison
Post date October 30, 2024

Download model script

download.sh

#!/bin/bash

# Base URL for downloading model files
BASE_URL="https://huggingface.co/microsoft/OmniParser/resolve/main"

# Define folder structure and create folders
mkdir -p weights/icon_detect
mkdir -p weights/icon_caption_florence
mkdir -p weights/icon_caption_blip2

# Declare an associative array of required files with paths
declare -A model_files=(
  ["weights/icon_detect/model.safetensors"]="$BASE_URL/icon_detect/model.safetensors"
  ["weights/icon_detect/model.yaml"]="$BASE_URL/icon_detect/model.yaml"
  ["weights/icon_caption_florence/model.safetensors"]="$BASE_URL/icon_caption_florence/model.safetensors"
  ["weights/icon_caption_florence/config.json"]="$BASE_URL/icon_caption_florence/config.json"
  ["weights/icon_caption_blip2/model.safetensors"]="$BASE_URL/icon_caption_blip2/model.safetensors"
  ["weights/icon_caption_blip2/config.json"]="$BASE_URL/icon_caption_blip2/config.json"
)

# Download each file into its specified directory
for file_path in "${!model_files[@]}"; do
  wget -O "$file_path" "${model_files[$file_path]}"
done

echo "All required model and configuration files downloaded and organised."

# Run the conversion script if necessary files are present
if [ -f "weights/icon_detect/model.safetensors" ] && [ -f "weights/icon_detect/model.yaml" ]; then
  python weights/convert_safetensor_to_pt.py
  echo "Conversion to best.pt completed."
else
  echo "Error: Required files for conversion not found."
fi

bash download.sh

Python code

app.py

# 1. Import libraries

from utils import get_som_labeled_img, check_ocr_box, get_caption_model_processor, get_yolo_model
import torch
from ultralytics import YOLO
from PIL import Image
import base64
import matplotlib.pyplot as plt
import io
import json
from rich import print

# 2. Configuration

device = 'cuda'
som_model = get_yolo_model(model_path='weights/icon_detect/best.pt')
som_model.to(device)
print('model to {}'.format(device))
caption_model_processor = get_caption_model_processor(model_name="florence2", model_name_or_path="weights/icon_caption_florence", device=device)

image_path = 'imgs/windows_multitab.png'
image = Image.open(image_path)
image_rgb = image.convert('RGB')

draw_bbox_config = {
    'text_scale': 0.8,
    'text_thickness': 2,
    'text_padding': 3,
    'thickness': 3,
}
BOX_TRESHOLD = 0.03

# 3. Get labeled image and results
ocr_bbox_rslt, is_goal_filtered = check_ocr_box(
    image_path, 
    display_img=False, 
    output_bb_format='xyxy', 
    goal_filtering=None, 
    easyocr_args={'paragraph': False, 'text_threshold': 0.9}
)
text, ocr_bbox = ocr_bbox_rslt

dino_labeled_img, label_coordinates, parsed_content_list = get_som_labeled_img(
    image_path, 
    som_model, 
    BOX_TRESHOLD=BOX_TRESHOLD, 
    output_coord_in_ratio=False, 
    ocr_bbox=ocr_bbox, 
    draw_bbox_config=draw_bbox_config, 
    caption_model_processor=caption_model_processor, 
    ocr_text=text, 
    use_local_semantics=True, 
    iou_threshold=0.1
)

decoded_image = Image.open(io.BytesIO(base64.b64decode(dino_labeled_img)))
decoded_image.save("labeled_image_output.png")
print(parsed_content_list)

python app.py

Local Ollama

n8n Setup Ollama – Expose to Public

Post author By praison
Post date October 25, 2024

Setup PostgreSQL

brew install postgresql@14

❯ psql -U postgres

psql (14.13 (Homebrew))
Type "help" for help.

postgres=# SHOW config_file;
                   config_file                   
-------------------------------------------------
 /opt/homebrew/var/postgresql@14/postgresql.conf

Edit /opt/homebrew/var/postgresql@14/postgresql.conf

code /opt/homebrew/var/postgresql@14/postgresql.conf

And add

listen_addresses = '*'

Edit /opt/homebrew/var/postgresql@14/pg_hba.conf

code /opt/homebrew/var/postgresql@14/pg_hba.conf

And add

host    all             all             0.0.0.0/0               md5

Ollama setup

brew install ollama

launchctl setenv OLLAMA_HOST 0.0.0.0:11434

Then restart Ollama

LLM

OpenAI Swarm 100% Local – Ollama LM Studio

Post author By praison
Post date October 21, 2024

Ollama

ollama pull llama3.2
export OPENAI_API_KEY=fake-key
export OPENAI_MODEL_NAME=llama3.2
export OPENAI_BASE_URL=http://localhost:11434/v1
pip install git+https://github.com/openai/swarm.git duckduckgo-search

from duckduckgo_search import DDGS
from swarm import Swarm, Agent
from datetime import datetime

current_date = datetime.now().strftime("%Y-%m")

# Initialize Swarm client
client = Swarm()

# 1. Create Internet Search Tool

def get_news_articles(topic):
    print(f"Running DuckDuckGo news search for {topic}...")
    
    # DuckDuckGo search
    ddg_api = DDGS()
    results = ddg_api.text(f"{topic} {current_date}", max_results=5)
    if results:
        news_results = "\n\n".join([f"Title: {result['title']}\nURL: {result['href']}\nDescription: {result['body']}" for result in results])
        return news_results
    else:
        return f"Could not find news results for {topic}."
    
# 2. Create AI Agents

# News Agent to fetch news
news_agent = Agent(
    name="News Assistant",
    instructions="You provide the latest news articles for a given topic using DuckDuckGo search.",
    functions=[get_news_articles],
    model="llama3.2"
)

# Editor Agent to edit news
editor_agent = Agent(
    name="Editor Assistant",
    instructions="Rewrite and give me as news article ready for publishing. Each News story in separate section.",
    model="llama3.2"
)

# 3. Create workflow

def run_news_workflow(topic):
    print("Running news Agent workflow...")
    
    # Step 1: Fetch news
    news_response = client.run(
        agent=news_agent,
        messages=[{"role": "user", "content": f"Get me the news about {topic} on {current_date}"}],
    )
    
    raw_news = news_response.messages[-1]["content"]
    
    # Step 2: Pass news to editor for final review
    edited_news_response = client.run(
        agent=editor_agent,
        messages=[{"role": "user", "content": raw_news }],
    )
    
    return edited_news_response.messages[-1]["content"]

# Example of running the news workflow for a given topic
print(run_news_workflow("AI"))

LM Studio

export OPENAI_BASE_URL=http://localhost:1234/v1

API

API Authentication: A Quick Guide

Post author By praison
Post date October 20, 2024

When integrating with APIs, handling authentication is a crucial step to ensure secure communication between clients and servers. Depending on the API provider, different types of authentication methods are supported. Here’s a breakdown of the common authentication types, their associated fields, and configuration options you may encounter when working with APIs.

1. OAuth Authentication:

OAuth is one of the most widely used authentication mechanisms, providing delegated access without sharing credentials directly. It allows users to grant limited access to their resources on one site to another site without exposing their credentials.

Fields:
Client ID
Client Secret
Authorization URL
Token URL
Scope
Token Exchange Method:
- Default (POST request)
- Basic Authorization Header

2. API Key Authentication:

API Key authentication is a simpler form of authentication that involves sending a secret key with each request. It is often used for server-to-server communication or in situations where OAuth is unnecessarily complex.

Fields:
API Key
Auth Type:
- Basic
- Bearer
- Custom
- Custom Header Name (e.g., X-Api-Key)

3. None (No Authentication):

In some cases, APIs do not require any authentication. While less secure, it may be suitable for public endpoints or internal systems.

Fields:
No additional fields

By understanding these options, you can select and configure the appropriate authentication method for your API integration, ensuring both security and ease of access.