Phase 14: Local LLMs β Start HereΒΆ
Run powerful language models completely locally β no API keys, no usage costs, full data privacy.
Why Run LLMs Locally?ΒΆ
Privacy: Data never leaves your machine
Cost: Zero per-token charges at inference time
Offline: Works without internet
Customization: Fine-tune for your specific use case
Control: No rate limits, no terms of service restrictions
Notebooks in This PhaseΒΆ
Notebook |
Topic |
|---|---|
|
Ollama β run Llama 3, Mistral, Gemma locally |
|
Model landscape: Llama 3, Mistral, Phi-3, Gemma |
|
Build a fully local RAG system |
|
Serve models via OpenAI-compatible API |
Tools Youβll UseΒΆ
Tool |
Purpose |
|---|---|
Ollama |
Easy local model management and serving |
llama.cpp |
Low-level inference, GGUF format |
LM Studio |
GUI for local model management |
vLLM |
High-throughput serving for production |
Transformers |
HuggingFace inference for any model |
Top Local Models (2026)ΒΆ
Model |
Size |
Strength |
|---|---|---|
Llama 3.3 70B |
70B |
Best overall open source |
Mistral 7B |
7B |
Fast, great for most tasks |
Phi-4 |
14B |
Microsoft, strong reasoning |
Gemma 2 9B |
9B |
Google, efficient |
DeepSeek-R1 |
7B-70B |
Strong reasoning, open weights |
Qwen 2.5 |
7B-72B |
Multilingual, code |
PrerequisitesΒΆ
RAG Systems (Phase 08)
Install Ollama:
https://ollama.aithenollama pull llama3.3
Learning PathΒΆ
01_ollama_quickstart.ipynb β Install Ollama first
02_open_source_models_overview.ipynb
03_local_rag_with_ollama.ipynb
04_llm_server_and_api.ipynb