Vision-Language ModelsΒΆ

CLIP, vision-language models, and multimodal RAG pipelines.