Phase 14: Local LLMsΒΆ
This module should help you answer a practical question: when does running models locally make sense, and what trade-offs do you accept in exchange for privacy, cost control, and deployment flexibility?
Actual Module ContentsΒΆ
Recommended OrderΒΆ
Start with Ollama and the model overview
Then build a local RAG workflow
Then study serving and API patterns
Finish with speculative decoding and performance considerations
What To Learn HereΒΆ
The difference between hosted APIs and local inference
How quantization and model size affect usability
What Ollama is good at and where it is limiting
How to expose a local model behind an API
Why latency and throughput tuning matter once a prototype works
Study AdviceΒΆ
Keep the first pass practical: install one tool, run one model, ship one API.
Do not optimize before measuring.
Compare local quality against your hosted baseline before committing to an on-device stack.
Good Follow-On ProjectsΒΆ
A private document assistant
A local coding helper with retrieval
A lightweight OpenAI-compatible local serving layer