Phase 14: Local LLMs β€” Start HereΒΆ

Run powerful language models completely locally β€” no API keys, no usage costs, full data privacy.

Why Run LLMs Locally?ΒΆ

  • Privacy: Data never leaves your machine

  • Cost: Zero per-token charges at inference time

  • Offline: Works without internet

  • Customization: Fine-tune for your specific use case

  • Control: No rate limits, no terms of service restrictions

Notebooks in This PhaseΒΆ

Notebook

Topic

01_ollama_quickstart.ipynb

Ollama β€” run Llama 3, Mistral, Gemma locally

02_open_source_models_overview.ipynb

Model landscape: Llama 3, Mistral, Phi-3, Gemma

03_local_rag_with_ollama.ipynb

Build a fully local RAG system

04_llm_server_and_api.ipynb

Serve models via OpenAI-compatible API

Tools You’ll UseΒΆ

Tool

Purpose

Ollama

Easy local model management and serving

llama.cpp

Low-level inference, GGUF format

LM Studio

GUI for local model management

vLLM

High-throughput serving for production

Transformers

HuggingFace inference for any model

Top Local Models (2026)ΒΆ

Model

Size

Strength

Llama 3.3 70B

70B

Best overall open source

Mistral 7B

7B

Fast, great for most tasks

Phi-4

14B

Microsoft, strong reasoning

Gemma 2 9B

9B

Google, efficient

DeepSeek-R1

7B-70B

Strong reasoning, open weights

Qwen 2.5

7B-72B

Multilingual, code

PrerequisitesΒΆ

  • RAG Systems (Phase 08)

  • Install Ollama: https://ollama.ai then ollama pull llama3.3

Learning PathΒΆ

01_ollama_quickstart.ipynb       ← Install Ollama first
02_open_source_models_overview.ipynb
03_local_rag_with_ollama.ipynb
04_llm_server_and_api.ipynb