Run this notebook: Open in Colab Open in Kaggle

Phase 14: Local LLMs — Start Here¶

Run powerful language models completely locally — no API keys, no usage costs, full data privacy.

Why Run LLMs Locally?¶

Privacy: Data never leaves your machine
Cost: Zero per-token charges at inference time
Offline: Works without internet
Customization: Fine-tune for your specific use case
Control: No rate limits, no terms of service restrictions

Notebooks in This Phase¶

Notebook	Topic
`01_ollama_quickstart.ipynb`	Ollama — run Llama 3, Mistral, Gemma locally
`02_open_source_models_overview.ipynb`	Model landscape: Llama 3, Mistral, Phi-3, Gemma
`03_local_rag_with_ollama.ipynb`	Build a fully local RAG system
`04_llm_server_and_api.ipynb`	Serve models via OpenAI-compatible API

Tools You’ll Use¶

Tool	Purpose
Ollama	Easy local model management and serving
llama.cpp	Low-level inference, GGUF format
LM Studio	GUI for local model management
vLLM	High-throughput serving for production
Transformers	HuggingFace inference for any model

Top Local Models (2026)¶

Model	Size	Strength
Llama 3.3 70B	70B	Best overall open source
Mistral 7B	7B	Fast, great for most tasks
Phi-4	14B	Microsoft, strong reasoning
Gemma 2 9B	9B	Google, efficient
DeepSeek-R1	7B-70B	Strong reasoning, open weights
Qwen 2.5	7B-72B	Multilingual, code

Prerequisites¶

RAG Systems (Phase 08)
Install Ollama: https://ollama.ai then ollama pull llama3.3

Learning Path¶

01_ollama_quickstart.ipynb       ← Install Ollama first
02_open_source_models_overview.ipynb
03_local_rag_with_ollama.ipynb
04_llm_server_and_api.ipynb