Semantic Textual Similarity (STS)ΒΆ

What: Compute a cross-similarity matrix between two lists of sentences using model.similarity(), showing how each sentence in list 1 relates to every sentence in list 2.

Why: Semantic Textual Similarity (STS) is one of the core evaluation tasks for sentence embedding models. Unlike symmetric similarity (comparing sentences within one list), STS typically compares two separate lists – for example, pairing user queries with candidate answers, or aligning sentences across document versions. The model.similarity() method returns a full \(m \times n\) matrix of cosine similarities, making it easy to find the best match for each sentence.

How: Given embeddings \(\mathbf{E}_1 \in \mathbb{R}^{m \times d}\) and \(\mathbf{E}_2 \in \mathbb{R}^{n \times d}\), the similarity matrix is: $\(S_{ij} = \frac{\mathbf{E}_1[i] \cdot \mathbf{E}_2[j]}{\|\mathbf{E}_1[i]\| \, \|\mathbf{E}_2[j]\|}\)$

Connection: STS benchmarks (STS-B, SICK-R) are used to evaluate and compare embedding models on the MTEB leaderboard. High STS performance indicates the model captures semantic meaning well, which translates to better search, clustering, and RAG quality.

# https://sbert.net/docs/sentence_transformer/usage/semantic_textual_similarity.html

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")

# Two lists of sentences
sentences1 = [
    "The new movie is awesome",
    "The cat sits outside",
    "A man is playing guitar",
]

sentences2 = [
    "The dog plays in the garden",
    "The new movie is so great",
    "A woman watches TV",
]

# Compute embeddings for both lists
embeddings1 = model.encode(sentences1)
embeddings2 = model.encode(sentences2)

# Compute cosine similarities
similarities = model.similarity(embeddings1, embeddings2)

# Output the pairs with their score
for idx_i, sentence1 in enumerate(sentences1):
    print(sentence1)
    for idx_j, sentence2 in enumerate(sentences2):
        print(f" - {sentence2: <30}: {similarities[idx_i][idx_j]:.4f}")