The Complete Guide to Ollama + Local AI¶
Run AI models locally. No API keys, no fees, full control.
What is Ollama¶
Ollama = Docker for LLM models. Downloads, configures and runs AI models locally. Simple CLI + REST API.
Installation¶
macOS / Linux¶
curl -fsSL https://ollama.com/install.sh | sh
Run a model¶
ollama run llama3.2
Download a model¶
ollama pull nomic-embed-text
Available Models¶
- llama3.2 (3B) — fast, good for chat
- llama3.1 (8B/70B) — more powerful
- mistral (7B) — good performance/speed ratio
- codellama (7B/34B) — for code
- nomic-embed-text — embeddings
- qwen2.5vl — vision model
REST API¶
Generate¶
curl http://localhost:11434/api/generate -d ‘{“model”:”llama3.2”,”prompt”:”Hello”}’
Chat¶
curl http://localhost:11434/api/chat -d ‘{“model”:”llama3.2”,”messages”:[{“role”:”user”,”content”:”Hi”}]}’
Embeddings¶
curl http://localhost:11434/api/embeddings -d ‘{“model”:”nomic-embed-text”,”prompt”:”Hello world”}’
Python Integration¶
import ollama
response = ollama.chat(model=”llama3.2”, messages=[ {“role”: “user”, “content”: “Explain Docker in one sentence.”} ]) print(response[“message”][“content”])
Modelfile — Custom Model¶
FROM llama3.2 SYSTEM “You are a helpful coding assistant. Respond in English.” PARAMETER temperature 0.7
Hardware Requirements¶
- 3B model: 4 GB RAM
- 7B model: 8 GB RAM
- 13B model: 16 GB RAM
- 70B model: 48+ GB RAM
- Apple Silicon: unified memory = ideal for local AI
Use Cases¶
- Coding assistant (offline)
- RAG (Retrieval Augmented Generation)
- Document analysis
- Embeddings for search
- Experiments without API costs
Why Local AI¶
No API fees. No latency. Full control over your data. And with Apple Silicon it is surprisingly fast.