Weaviate – Enterprise Vector DatabaseΒΆ

Schema-Driven Search with GraphQLΒΆ

Weaviate is an open-source vector database that combines vector search with a structured, schema-based data model accessed through a GraphQL API. Unlike simpler vector stores, Weaviate lets you define classes with typed properties (text, number, date, cross-references), built-in vectorization modules, and hybrid search that blends BM25 keyword scoring with dense vector similarity. This schema-first approach makes Weaviate a strong fit for enterprise applications where data governance, complex queries, and multi-tenant isolation are important.

InstallationΒΆ

The weaviate-client package (v4+) provides the Python API. Weaviate itself runs as a separate service – either via Docker, Weaviate Cloud, or the embedded mode. The client communicates over gRPC and REST. For local experiments, weaviate.connect_to_local() connects to a Docker instance on the default port.

# !pip install weaviate-client

import weaviate
from weaviate.classes.init import Auth

print('βœ… Imports successful')

1. Connect to WeaviateΒΆ

connect_to_local() establishes a connection to a Weaviate instance running on localhost:8080 (REST) and localhost:50051 (gRPC). The client object is your entry point for creating schemas, inserting data, and running queries. In production, you would use connect_to_weaviate_cloud() or connect_to_custom() with authentication credentials.

client = weaviate.connect_to_local()

print("βœ… Connected to Weaviate")

2. Create SchemaΒΆ

Weaviate requires an explicit schema that defines the class name and its properties with data types. Setting vectorizer_config=Configure.Vectorizer.none() means we will supply our own pre-computed vectors (bring-your-own-embeddings). Alternatively, you can configure a built-in vectorizer module (e.g., text2vec-openai or text2vec-transformers) to have Weaviate embed text automatically on insert and query. The schema is the foundation of Weaviate’s type safety and GraphQL query capabilities.

from weaviate.classes.config import Configure, Property, DataType

try:
    collection = client.collections.create(
        name="Document",
        vectorizer_config=Configure.Vectorizer.none(),
        properties=[
            Property(name="text", data_type=DataType.TEXT),
            Property(name="category", data_type=DataType.TEXT)
        ]
    )
    print("βœ… Schema created")
except Exception as e:
    print(f"Schema exists or error: {e}")

3. Add DataΒΆ

Data objects are inserted through the collection handle returned by client.collections.get(). Each object is a dictionary of property values matching the schema, plus an optional _vector field containing the pre-computed embedding. Weaviate validates property types at insert time and indexes both the vector (for similarity search) and the properties (for filtering and GraphQL queries). Batch inserts are available for high-throughput ingestion.

import numpy as np

documents = client.collections.get("Document")

documents.data.insert({
    "text": "Machine learning example",
    "category": "ML",
    "_vector": np.random.random(384).tolist()
})

print("βœ… Data inserted")