AI

Calculate the Distance Between Vectors in Milvus

Vector similarity search is a fundamental feature in machine learning applications including image recognition, natural language processing, and recommendation systems.

Milvus is a free and open-source vector database that allows to perform the efficient vector similarity searches. Milvus is powered by various vector search metrics such as Faiss, NMSLIB, and Annoy.

This post covers the basics of working with Milvus to calculate the distance between vectors.

Prerequisites:

Before proceeding with this tutorial, you need the following:

  1. Python 3.10 and above
  2. Installed and running Milvus server on your system
  3. NumPy Python package
  4. PyMilvus – The Python SDK for Milvus. You can install it using pip:
pip install pymilvus

Milvus and Vector Distance Calculation

In Milvus, the vectors are stored in collections. Each collection is partitioned into segments for a more efficient search and computation.

Milvus implements the user-defined distance functions that allow us to determine the distance between two vectors. The closer the vectors or shorter the distance, the more similar they are.

Milvus supports several metrics for vector distance calculation:

  1. Euclidean Distance (L2)
  2. Inner Product
  3. Hamming Distance
  4. Jaccard Distance
  5. Cosine Distance
  6. Tanimoto Distance

You can check out the mathematical implementation of each metric in the given link:

https://milvus.io/docs/metric.md

For this tutorial, we will focus on the Euclidean distance (L2) as the distance metric.

Import PyMilvus and NumPy

The first step is importing the necessary libraries. In this case, we only need the NumPy and PyMilvus packages.

from pymilvus import connections, DataType, CollectionSchema, FieldSchema, Collection import numpy as np

Connect to the Milvus Server

Next, we need to connect to the Milvus Server; we can do this using the connect() method and by providing the hostname and port to the Milvus server.

connections.connect("default", host="localhost", port="19530")

This should connect to the Milvus server that runs on the localhost and port 19530.

Create a Collection

Once connected to the Milvus server, we must set up the collection to store our vectors. In this case, we create a simple collection called “vectors” with a single field called “vector” which holds the “FloatVector” data type. The field also contains a dimension of 128.

fields = [ FieldSchema(name="vector", dtype=DataType.FLOAT_VECTOR, dim=128) ] schema = CollectionSchema(fields=fields, description="collection of vectors") collection = Collection(name="vectors", schema=schema)

Insert the Vectors into the Collection

Let us insert the vector data into the collection for demonstration. We can use NumPy and the random feature as shown in the following code:

vectors = [ [np.random.random(128).tolist(), np.random.random(128).tolist()] ] mr = collection.insert(vectors)

The “insert” operation returns a message object which we can use to get the IDs of the inserted vectors.

ids = mr.primary_keys

Load the Collection into the Memory

As with any Milvus search operation, we must load the collection from the disk to the system memory before performing any search.

collection.load()

Calculate the Distance Between the Vectors

Now that we have our vectors, we can use Milvus’ “search” function to calculate the distance between them. We set the “params” parameter with the “L2” metric type which denotes the Euclidean Distance.

search_params = {"metric_type": "L2"} results = collection.search(vectors, "vector", params=search_params, limit=2)

Once we run this search, the operation returns the two closest vectors to the values that we provided and their respective distance.

Extract the Distances

We can also access the distances from the result object using the following code:

distances = [match.distance for result in results for match in result]

Conclusion

We learned how to use Milvus to calculate the distance between vectors, an essential component in machine learning and AI applications.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list