Skip to main content

AI Assistant with Ollama

InstaCRUD ships with a built-in AI Assistant that works out of the box with cloud and local AI providers. Ollama is the local option — it runs large language models on your own hardware with no API keys, no cloud calls, and no data leaving your machine, giving you a fully private AI assistant that works entirely offline.


What Ollama is

Ollama is a free, open-source tool that downloads and serves LLMs through a local REST API compatible with the OpenAI spec. You install it once, pull the models you want, and any app that speaks OpenAI can talk to it — including InstaCRUD.


Setting up Ollama

1. Install Ollama

Download and install Ollama from ollama.com. It supports macOS, Linux, and Windows.

After installation, Ollama runs as a background service and exposes its API at http://localhost:11434.

2. Configure InstaCRUD

Add (or verify) the following in your .env file:

OLLAMA_BASE_URL=http://localhost:11434/v1

That's it. InstaCRUD's AI framework connects to Ollama through the same client it uses for every other provider.

3. Seed the model catalogue

Run the init script to populate the database with the pre-configured Ollama models:

cd backend
poetry run python init/init_ai_models.py

Pre-configured models

All chat models are under 15 B parameters — chosen to run realistically on a typical consumer GPU (typically 6–12 GB VRAM). The embedding models are lightweight and CPU-friendly.

Chat & completion

ModelPull tagSizeNotes
Mistral 7Bmistral:latest7 BFast general-purpose chat
Mixtral 8x7Bmixtral:latest8×7 B MoEHigher quality, needs more RAM
Dolphin Mixtral 8x7Bdolphin-mixtral:latest8×7 B MoEUncensored Mixtral fine-tune
DeepSeek R1 Distill 14Bdeepseek-r1:14b14 BReasoning / chain-of-thought
Llama 3.2 Vision 11Bllama3.2-vision:11b11 BVision — accepts image input
Qwen 3.5 9Bqwen3.5:9b9 BReasoning + vision
Qwen3 VL 8Bqwen3-vl:8b8 BVision + reasoning

Embeddings

ModelPull tagNotes
Nomic Embed v1.5nomic-embed-text:latestGeneral-purpose embeddings
BGE Large English v1.5znbang/bge:large-en-v1.5-q4_k_mHigh-quality English embeddings

Pulling all models

Run these commands to download every pre-configured model. Each pull may take a few minutes depending on your connection:

# Chat models
ollama pull mistral
ollama pull mixtral
ollama pull dolphin-mixtral
ollama pull deepseek-r1:14b
ollama pull llama3.2-vision:11b
ollama pull qwen3.5:9b
ollama pull qwen3-vl:8b

# Embedding models
ollama pull nomic-embed-text
ollama pull znbang/bge:large-en-v1.5-q4_k_m

You don't need to pull all of them — pull only the ones you plan to enable. Start with mistral if you want the smallest footprint and good quality for the size.


What you get

Once configured, the AI Assistant works exactly as it does with cloud providers, with two differences:

  • Fully private — all inference runs on your machine; nothing is sent to any external service.
  • No usage costs — Ollama models have no per-token charges. The credit system in InstaCRUD still tracks usage internally, but no real money is spent.

Switch between Ollama and cloud models at any time from the model dropdown in the assistant.


Tips

  • Not enough VRAM? Ollama will offload layers to CPU automatically, at the cost of speed. Mistral 7B can run on CPU-only, just slowly.
  • Adding more models? Any model available on ollama.com/library can be added to init_ai_models.py as a new entry with service: AiServiceProvider.OLLAMA. See Using the AI Framework for details.