Phase 5: Embeddings β€” Start HereΒΆ

Convert text, images, and code into dense vectors β€” the engine behind semantic search, RAG, and recommendation systems.

What Are Embeddings?ΒΆ

Embeddings map high-dimensional objects (words, sentences, images) to fixed-size vectors where similar things are close together. They power semantic search, clustering, recommendation, and retrieval-augmented generation (RAG).

Notebooks in This PhaseΒΆ

Notebook

Topic

embeddings_intro.ipynb

What embeddings are and why they matter

sentence_transformer_intro.ipynb

Sentence Transformers (SBERT) β€” the workhorse

openai_embeddings.ipynb

OpenAI text-embedding-3 API

huggingface_embeddings.ipynb

Local HuggingFace embedding models

semantic_similarity.ipynb

Cosine similarity, dot product, Euclidean

semantic_search_intro.ipynb

Build a semantic search engine

semantic_textual_similarity_intro.ipynb

STS benchmarks and evaluation

paraphrase_mining_intro.ipynb

Find duplicate/similar sentences at scale

sparse_encoder_intro.ipynb

BM25, SPLADE β€” sparse vs. dense embeddings

vector_database_demo.ipynb

Store and query embeddings in a vector DB

PrerequisitesΒΆ

  • Python basics (Phase 01)

  • Tokenization (Phase 04)

Key Concepts You’ll LearnΒΆ

  • Dense vs. sparse embeddings

  • Sentence Transformers (all-MiniLM, BGE, E5, GTE)

  • OpenAI text-embedding-3-small and text-embedding-3-large

  • Cosine similarity and nearest-neighbor search

  • Building semantic search engines

  • Vector databases (ChromaDB, FAISS, Pinecone, Qdrant)

Learning PathΒΆ

embeddings_intro.ipynb              ← Start here
sentence_transformer_intro.ipynb
openai_embeddings.ipynb
semantic_similarity.ipynb
semantic_search_intro.ipynb
huggingface_embeddings.ipynb
sparse_encoder_intro.ipynb
vector_database_demo.ipynb
paraphrase_mining_intro.ipynb
semantic_textual_similarity_intro.ipynb

Next PhaseΒΆ

After embeddings, move to Phase 07: Vector Databases for production-scale storage and retrieval.