AI Assistant with Ollama

InstaCRUD ships with a built-in AI Assistant that works out of the box with cloud and local AI providers. Ollama is the local option — it runs large language models on your own hardware with no API keys, no cloud calls, and no data leaving your machine, giving you a fully private AI assistant that works entirely offline.

What Ollama is

Ollama is a free, open-source tool that downloads and serves LLMs through a local REST API compatible with the OpenAI spec. You install it once, pull the models you want, and any app that speaks OpenAI can talk to it — including InstaCRUD.

Setting up Ollama

1. Install Ollama

Download and install Ollama from ollama.com. It supports macOS, Linux, and Windows.

After installation, Ollama runs as a background service and exposes its API at http://localhost:11434.

2. Configure InstaCRUD

Add (or verify) the following in your .env file:

OLLAMA_BASE_URL=http://localhost:11434/v1

That's it. InstaCRUD's AI framework connects to Ollama through the same client it uses for every other provider.

3. Seed the model catalogue

Run the init script to populate the database with the pre-configured Ollama models:

cd backend
poetry run python init/init_ai_models.py

Pre-configured models

All chat models are under 15 B parameters — chosen to run realistically on a typical consumer GPU (typically 6–12 GB VRAM). The embedding models are lightweight and CPU-friendly.

Chat & completion

Model	Pull tag	Size	Notes
Mistral 7B	`mistral:latest`	7 B	Fast general-purpose chat
Mixtral 8x7B	`mixtral:latest`	8×7 B MoE	Higher quality, needs more RAM
Dolphin Mixtral 8x7B	`dolphin-mixtral:latest`	8×7 B MoE	Uncensored Mixtral fine-tune
DeepSeek R1 Distill 14B	`deepseek-r1:14b`	14 B	Reasoning / chain-of-thought
Llama 3.2 Vision 11B	`llama3.2-vision:11b`	11 B	Vision — accepts image input
Qwen 3.5 9B	`qwen3.5:9b`	9 B	Reasoning + vision
Qwen3 VL 8B	`qwen3-vl:8b`	8 B	Vision + reasoning

Embeddings

Model	Pull tag	Notes
Nomic Embed v1.5	`nomic-embed-text:latest`	General-purpose embeddings
BGE Large English v1.5	`znbang/bge:large-en-v1.5-q4_k_m`	High-quality English embeddings

Pulling all models

Run these commands to download every pre-configured model. Each pull may take a few minutes depending on your connection:

# Chat models
ollama pull mistral
ollama pull mixtral
ollama pull dolphin-mixtral
ollama pull deepseek-r1:14b
ollama pull llama3.2-vision:11b
ollama pull qwen3.5:9b
ollama pull qwen3-vl:8b

# Embedding models
ollama pull nomic-embed-text
ollama pull znbang/bge:large-en-v1.5-q4_k_m

You don't need to pull all of them — pull only the ones you plan to enable. Start with mistral if you want the smallest footprint and good quality for the size.

What you get

Once configured, the AI Assistant works exactly as it does with cloud providers, with two differences:

Fully private — all inference runs on your machine; nothing is sent to any external service.
No usage costs — Ollama models have no per-token charges. The credit system in InstaCRUD still tracks usage internally, but no real money is spent.

Switch between Ollama and cloud models at any time from the model dropdown in the assistant.

Tips

Not enough VRAM? Ollama will offload layers to CPU automatically, at the cost of speed. Mistral 7B can run on CPU-only, just slowly.
Adding more models? Any model available on ollama.com/library can be added to init_ai_models.py as a new entry with service: AiServiceProvider.OLLAMA. See Using the AI Framework for details.
Using tools with Ollama? The AI tool functions — including the two-stage abuse guardrail — work with any Ollama model that supports tool calling. Set TOOLS_GUARDRAIL_MODEL to a locally served model identifier to keep the guardrail fully offline.

What Ollama is​

Setting up Ollama​

1. Install Ollama​

2. Configure InstaCRUD​

3. Seed the model catalogue​

Pre-configured models​

Chat & completion​

Embeddings​

Pulling all models​

What you get​

Tips​