Phase 5: Embeddings β Start HereΒΆ
Convert text, images, and code into dense vectors β the engine behind semantic search, RAG, and recommendation systems.
What Are Embeddings?ΒΆ
Embeddings map high-dimensional objects (words, sentences, images) to fixed-size vectors where similar things are close together. They power semantic search, clustering, recommendation, and retrieval-augmented generation (RAG).
Notebooks in This PhaseΒΆ
Notebook |
Topic |
|---|---|
|
What embeddings are and why they matter |
|
Sentence Transformers (SBERT) β the workhorse |
|
OpenAI text-embedding-3 API |
|
Local HuggingFace embedding models |
|
Cosine similarity, dot product, Euclidean |
|
Build a semantic search engine |
|
STS benchmarks and evaluation |
|
Find duplicate/similar sentences at scale |
|
BM25, SPLADE β sparse vs. dense embeddings |
|
Store and query embeddings in a vector DB |
PrerequisitesΒΆ
Python basics (Phase 01)
Tokenization (Phase 04)
Key Concepts Youβll LearnΒΆ
Dense vs. sparse embeddings
Sentence Transformers (all-MiniLM, BGE, E5, GTE)
OpenAI
text-embedding-3-smallandtext-embedding-3-largeCosine similarity and nearest-neighbor search
Building semantic search engines
Vector databases (ChromaDB, FAISS, Pinecone, Qdrant)
Learning PathΒΆ
embeddings_intro.ipynb β Start here
sentence_transformer_intro.ipynb
openai_embeddings.ipynb
semantic_similarity.ipynb
semantic_search_intro.ipynb
huggingface_embeddings.ipynb
sparse_encoder_intro.ipynb
vector_database_demo.ipynb
paraphrase_mining_intro.ipynb
semantic_textual_similarity_intro.ipynb
Next PhaseΒΆ
After embeddings, move to Phase 07: Vector Databases for production-scale storage and retrieval.