AI

Milvus Vector Search

Milvus is an open-source vector database engine that is designed to handle the large-scale vector tasks. It provides fast and efficient search capabilities for high-dimensional data which makes it suitable for AI applications such as image and video similarity search, recommendation systems, and natural language processing.

PyMilvus is the official Python SDK for Milvus which offers a convenient and user-friendly interface to interact with the Milvus server.

This tutorial walks you through setting up Milvus, indexing the vectors, and performing the vector searches using PyMilvus.

Prerequisites:

To follow this tutorial, ensure that you have the following requirements:

  • Installed Python 3.10 or higher on your system
  • Installed PyMilvus library

If you don’t have the Pymilvus SDK already installed, you can install it quickly using pip as shown in the following command:

pip install pymilvus

Once installed, we can dive in and learn how to perform a vector search using PyMilvus.

Connect to the Milvus Server

The first step is to connect to the Milvus server using PyMilvus.

We start by importing the necessary libraries and creating a connection object as shown in the following code:

import pymilvus as mv

# Connect to Milvus server

milvus = mv.Milvus(host='localhost', port='19530')

If your Milvus cluster is not running on the localhost and port, replace the localhost and 19530 with the appropriate values for your server.

Create a Collection

Once connected to the Milvus server, we can setup a primary collection. A collection in Milvus is a container that is used to store the vector data.

We need to create a collection before we can index and search the vectors:

collection_name = 'test_collection'
dimension = 128
# Create a collection
milvus.create_collection(collection_name, {'fields': [{'name': 'embedding', 'type': mv.DataType.FLOAT_VECTOR, 'params': {'dim': dimension}}]})

The previous code sets up a basic collection with the specified name and dimensions. We also define the collection schema to store the vectors that we wish to search.

The collection schema comprises an embedding field with the float vector data type and the specified dimension.

Index Vectors

Before performing a vector search, we need to index our vectors to make them searchable. Milvus supports different indexing methods such as IVF_FLAT, IVF_SQ8, and HNSW. For this post, we use the IVF_FLAT indexing method for simplicity.

index_param = {'index_type': 'IVF_FLAT', 'nlist': 4096}
milvus.create_index(collection_name, index_param)

In this case, we specify the index type as “IVF_FLAT” with an nlist value of 4096.

Insert the Data

Next, let us insert some data into our collection. PyMilvus uses the NumPy arrays to represent the vector data.

import numpy as np

# Generate random vectors

vectors = np.random.rand(1000, dimension).astype(np.float32)

# Insert vectors into the collection

milvus.insert(collection_name, vectors.tolist())

In the provided example, we use the np.random.rand() method to generate a batch of 1000 random vectors of the specified dimension and insert them into the collection using the insert() method.

Perform the Vector Search

We can now perform the vector search operations with the indexed and inserted vectors. To find the most similar vectors to a given query vector, we can run the code as follows:

# query vector

query_vector = np.random.rand(dimension).astype(np.float32)

# Perform vector search

results = milvus.search(collection_name, query_vector.tolist(), top_k=5)

Here, we generate a random query vector and perform a search using the search() method. We specify the top_k=5 to retrieve the top 5 most similar vectors.

This should return the results object which contains the search results including the matched vectors and their distances.

Clean Up

Finally, don’t forget to clean up the resources and close the connection to the Milvus server:

milvus.drop_collection(collection_name)

milvus.close()

There you have it!

Conclusion

In this tutorial, we covered the basics of the Milvus vector search using the PyMilvus SDK. We explored the basics of connecting to the Milvus server, creating collections, adding the data, and carrying out a vector search.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list