Phase 14: Local LLMsΒΆ

This module should help you answer a practical question: when does running models locally make sense, and what trade-offs do you accept in exchange for privacy, cost control, and deployment flexibility?

Actual Module ContentsΒΆ

  1. 00_START_HERE.ipynb

  2. 01_ollama_quickstart.ipynb

  3. 02_open_source_models_overview.ipynb

  4. 03_local_rag_with_ollama.ipynb

  5. 04_llm_server_and_api.ipynb

  6. 05_speculative_decoding.ipynb

What To Learn HereΒΆ

  • The difference between hosted APIs and local inference

  • How quantization and model size affect usability

  • What Ollama is good at and where it is limiting

  • How to expose a local model behind an API

  • Why latency and throughput tuning matter once a prototype works

Study AdviceΒΆ

  • Keep the first pass practical: install one tool, run one model, ship one API.

  • Do not optimize before measuring.

  • Compare local quality against your hosted baseline before committing to an on-device stack.

Good Follow-On ProjectsΒΆ

  • A private document assistant

  • A local coding helper with retrieval

  • A lightweight OpenAI-compatible local serving layer