Weaviate β Enterprise Vector DatabaseΒΆ
Schema-Driven Search with GraphQLΒΆ
Weaviate is an open-source vector database that combines vector search with a structured, schema-based data model accessed through a GraphQL API. Unlike simpler vector stores, Weaviate lets you define classes with typed properties (text, number, date, cross-references), built-in vectorization modules, and hybrid search that blends BM25 keyword scoring with dense vector similarity. This schema-first approach makes Weaviate a strong fit for enterprise applications where data governance, complex queries, and multi-tenant isolation are important.
InstallationΒΆ
The weaviate-client package (v4+) provides the Python API. Weaviate itself runs as a separate service β either via Docker, Weaviate Cloud, or the embedded mode. The client communicates over gRPC and REST. For local experiments, weaviate.connect_to_local() connects to a Docker instance on the default port.
# !pip install weaviate-client
import weaviate
from weaviate.classes.init import Auth
print('β
Imports successful')
1. Connect to WeaviateΒΆ
connect_to_local() establishes a connection to a Weaviate instance running on localhost:8080 (REST) and localhost:50051 (gRPC). The client object is your entry point for creating schemas, inserting data, and running queries. In production, you would use connect_to_weaviate_cloud() or connect_to_custom() with authentication credentials.
client = weaviate.connect_to_local()
print("β
Connected to Weaviate")
2. Create SchemaΒΆ
Weaviate requires an explicit schema that defines the class name and its properties with data types. Setting vectorizer_config=Configure.Vectorizer.none() means we will supply our own pre-computed vectors (bring-your-own-embeddings). Alternatively, you can configure a built-in vectorizer module (e.g., text2vec-openai or text2vec-transformers) to have Weaviate embed text automatically on insert and query. The schema is the foundation of Weaviateβs type safety and GraphQL query capabilities.
from weaviate.classes.config import Configure, Property, DataType
try:
collection = client.collections.create(
name="Document",
vectorizer_config=Configure.Vectorizer.none(),
properties=[
Property(name="text", data_type=DataType.TEXT),
Property(name="category", data_type=DataType.TEXT)
]
)
print("β
Schema created")
except Exception as e:
print(f"Schema exists or error: {e}")
3. Add DataΒΆ
Data objects are inserted through the collection handle returned by client.collections.get(). Each object is a dictionary of property values matching the schema, plus an optional _vector field containing the pre-computed embedding. Weaviate validates property types at insert time and indexes both the vector (for similarity search) and the properties (for filtering and GraphQL queries). Batch inserts are available for high-throughput ingestion.
import numpy as np
documents = client.collections.get("Document")
documents.data.insert({
"text": "Machine learning example",
"category": "ML",
"_vector": np.random.random(384).tolist()
})
print("β
Data inserted")
4. SearchΒΆ
near_vector performs approximate nearest-neighbor search using the supplied query vector. Results are returned as objects with their properties and optional metadata (distance, certainty, vector). Weaviate also supports near_text (if a vectorizer module is configured), near_object (find objects similar to an existing one), bm25 (keyword search), and hybrid (combines BM25 and vector scores). The limit parameter controls how many results to return.
query_vector = np.random.random(384).tolist()
response = documents.query.near_vector(
near_vector=query_vector,
limit=3
)
for obj in response.objects:
print(f"Text: {obj.properties['text']}")