Run this notebook: Open in Colab Open in Kaggle

Weaviate – Enterprise Vector Database¶

Schema-Driven Search with GraphQL¶

Weaviate is an open-source vector database that combines vector search with a structured, schema-based data model accessed through a GraphQL API. Unlike simpler vector stores, Weaviate lets you define classes with typed properties (text, number, date, cross-references), built-in vectorization modules, and hybrid search that blends BM25 keyword scoring with dense vector similarity. This schema-first approach makes Weaviate a strong fit for enterprise applications where data governance, complex queries, and multi-tenant isolation are important.

Installation¶

The weaviate-client package (v4+) provides the Python API. Weaviate itself runs as a separate service – either via Docker, Weaviate Cloud, or the embedded mode. The client communicates over gRPC and REST. For local experiments, weaviate.connect_to_local() connects to a Docker instance on the default port.

# !pip install weaviate-client

import weaviate
from weaviate.classes.init import Auth

print('✅ Imports successful')

1. Connect to Weaviate¶

connect_to_local() establishes a connection to a Weaviate instance running on localhost:8080 (REST) and localhost:50051 (gRPC). The client object is your entry point for creating schemas, inserting data, and running queries. In production, you would use connect_to_weaviate_cloud() or connect_to_custom() with authentication credentials.

client = weaviate.connect_to_local()

print("✅ Connected to Weaviate")

2. Create Schema¶

Weaviate requires an explicit schema that defines the class name and its properties with data types. Setting vectorizer_config=Configure.Vectorizer.none() means we will supply our own pre-computed vectors (bring-your-own-embeddings). Alternatively, you can configure a built-in vectorizer module (e.g., text2vec-openai or text2vec-transformers) to have Weaviate embed text automatically on insert and query. The schema is the foundation of Weaviate’s type safety and GraphQL query capabilities.

from weaviate.classes.config import Configure, Property, DataType

try:
    collection = client.collections.create(
        name="Document",
        vectorizer_config=Configure.Vectorizer.none(),
        properties=[
            Property(name="text", data_type=DataType.TEXT),
            Property(name="category", data_type=DataType.TEXT)
        ]
    )
    print("✅ Schema created")
except Exception as e:
    print(f"Schema exists or error: {e}")

3. Add Data¶

Data objects are inserted through the collection handle returned by client.collections.get(). Each object is a dictionary of property values matching the schema, plus an optional _vector field containing the pre-computed embedding. Weaviate validates property types at insert time and indexes both the vector (for similarity search) and the properties (for filtering and GraphQL queries). Batch inserts are available for high-throughput ingestion.

import numpy as np

documents = client.collections.get("Document")

documents.data.insert({
    "text": "Machine learning example",
    "category": "ML",
    "_vector": np.random.random(384).tolist()
})

print("✅ Data inserted")

4. Search¶

near_vector performs approximate nearest-neighbor search using the supplied query vector. Results are returned as objects with their properties and optional metadata (distance, certainty, vector). Weaviate also supports near_text (if a vectorizer module is configured), near_object (find objects similar to an existing one), bm25 (keyword search), and hybrid (combines BM25 and vector scores). The limit parameter controls how many results to return.

query_vector = np.random.random(384).tolist()

response = documents.query.near_vector(
    near_vector=query_vector,
    limit=3
)

for obj in response.objects:
    print(f"Text: {obj.properties['text']}")