# Basic requirements
# !pip install numpy sentence-transformers matplotlib scikit-learn
# Vector databases (install as needed)
# !pip install chromadb qdrant-client weaviate-client pymilvus
2. Test your setup:
Run the cell below to verify that NumPy and the sentence-transformers library are installed correctly. The SentenceTransformer model (all-MiniLM-L6-v2) converts any text string into a 384-dimensional dense vector that captures its semantic meaning. This embedding step is the bridge between raw text and the numerical world of vector databases β every document you store and every query you issue will first pass through an encoder like this one.
import numpy as np
from sentence_transformers import SentenceTransformer
# Load embedding model
model = SentenceTransformer('all-MiniLM-L6-v2')
# Test embedding
text = "Vector databases are awesome!"
embedding = model.encode(text)
print(f"β
Setup successful!")
print(f"Text: {text}")
print(f"Embedding shape: {embedding.shape}")
print(f"First 5 dimensions: {embedding[:5]}")
π Learning PathΒΆ
Beginner Track (3.5 hours)ΒΆ
Start with
01_vector_db_basics.ipynbto understand conceptsTry
02_chroma_guide.ipynbfor hands-on practiceBuild a simple semantic search app
Intermediate Track (7 hours)ΒΆ
Complete Beginner Track
Learn
03_qdrant_guide.ipynbfor production patternsExplore
04_weaviate_guide.ipynbfor hybrid searchStudy
05_milvus_guide.ipynbfor scale
Production Track (Full course)ΒΆ
Complete all notebooks
Review README.md for all 11 database options
Build a RAG (Retrieval-Augmented Generation) system
Deploy to production
π― PrerequisitesΒΆ
Before starting, you should understand:
β Phase 1 - Tokenization: How to convert text to tokens
β Phase 2 - Embeddings: How to convert text to vectors
Basic Python and numpy
API concepts (REST, requests)
ποΈ Database ComparisonΒΆ
Database |
Best For |
Difficulty |
Cost |
|---|---|---|---|
Chroma |
Learning, prototyping |
Easy |
Free |
Qdrant |
Production apps |
Medium |
Free/Paid |
Weaviate |
Enterprise, GraphQL |
Medium |
Free/Paid |
Milvus |
Massive scale (billions) |
Hard |
Free/Paid |
FAISS |
Research, benchmarks |
Easy |
Free |
Pinecone |
Managed cloud |
Easy |
Paid |
pgvector |
Existing PostgreSQL |
Medium |
Free |
See README.md for all 11+ options with detailed examples!
π‘ Common Use CasesΒΆ
Semantic Search: Find similar documents based on meaning
RAG Systems: Retrieve relevant context for LLMs
Recommendations: Find similar products/content
Question Answering: Match questions to answers
Duplicate Detection: Find near-duplicate content
Image Search: Find similar images by visual features
π Additional ResourcesΒΆ
README.md: Complete guide with 11 databases
Phase 1: Tokenization notebooks (7 notebooks)
Phase 2: Embeddings notebooks (10 notebooks)
π¦ Ready to Start?ΒΆ
Open 01_vector_db_basics.ipynb and begin your journey!
Each notebook is designed to be run step-by-step. Execute cells in order and experiment with the code.
Happy learning! π
πΊοΈ Your Complete Learning JourneyΒΆ
Phase 1: Tokenization (7 notebooks)
β
Phase 2: Embeddings (10 notebooks)
β
Phase 3: Vector Databases (5 notebooks) β You are here!
β
Phase 4: LLM Applications (Coming soon)
Total progress: 22 notebooks completed across 3 phases! π―