The Complete Guide to Ollama + Local AI¶

Run AI models locally. No API keys, no fees, full control.

What is Ollama¶

Ollama = Docker for LLM models. Downloads, configures and runs AI models locally. Simple CLI + REST API.

Installation¶

macOS / Linux¶

curl -fsSL https://ollama.com/install.sh | sh

Run a model¶

ollama run llama3.2

Download a model¶

ollama pull nomic-embed-text

Available Models¶

llama3.2 (3B) — fast, good for chat
llama3.1 (8B/70B) — more powerful
mistral (7B) — good performance/speed ratio
codellama (7B/34B) — for code
nomic-embed-text — embeddings
qwen2.5vl — vision model

REST API¶

Generate¶

curl http://localhost:11434/api/generate -d ‘{“model”:”llama3.2”,”prompt”:”Hello”}’

Chat¶

curl http://localhost:11434/api/chat -d ‘{“model”:”llama3.2”,”messages”:[{“role”:”user”,”content”:”Hi”}]}’

Embeddings¶

curl http://localhost:11434/api/embeddings -d ‘{“model”:”nomic-embed-text”,”prompt”:”Hello world”}’

Python Integration¶

import ollama

response = ollama.chat(model=”llama3.2”, messages=[ {“role”: “user”, “content”: “Explain Docker in one sentence.”} ]) print(response[“message”][“content”])

Modelfile — Custom Model¶

FROM llama3.2 SYSTEM “You are a helpful coding assistant. Respond in English.” PARAMETER temperature 0.7

Hardware Requirements¶

3B model: 4 GB RAM
7B model: 8 GB RAM
13B model: 16 GB RAM
70B model: 48+ GB RAM
Apple Silicon: unified memory = ideal for local AI

Use Cases¶

Coding assistant (offline)
RAG (Retrieval Augmented Generation)
Document analysis
Embeddings for search
Experiments without API costs

Why Local AI¶

No API fees. No latency. Full control over your data. And with Apple Silicon it is surprisingly fast.

ollamaaillmlocal

CORE SYSTEMS team

We build core systems and AI agents that keep operations running. 15 years of experience with enterprise IT.

All articles

The Complete Guide to Ollama + Local AI