PyMilvus is the official Python SDK for Milvus which offers a convenient and user-friendly interface to interact with the Milvus server.
This tutorial walks you through setting up Milvus, indexing the vectors, and performing the vector searches using PyMilvus.
Prerequisites:
To follow this tutorial, ensure that you have the following requirements:
- Installed Python 3.10 or higher on your system
- Installed PyMilvus library
If you don’t have the Pymilvus SDK already installed, you can install it quickly using pip as shown in the following command:
Once installed, we can dive in and learn how to perform a vector search using PyMilvus.
Connect to the Milvus Server
The first step is to connect to the Milvus server using PyMilvus.
We start by importing the necessary libraries and creating a connection object as shown in the following code:
# Connect to Milvus server
milvus = mv.Milvus(host='localhost', port='19530')
If your Milvus cluster is not running on the localhost and port, replace the localhost and 19530 with the appropriate values for your server.
Create a Collection
Once connected to the Milvus server, we can setup a primary collection. A collection in Milvus is a container that is used to store the vector data.
We need to create a collection before we can index and search the vectors:
dimension = 128
# Create a collection
milvus.create_collection(collection_name, {'fields': [{'name': 'embedding', 'type': mv.DataType.FLOAT_VECTOR, 'params': {'dim': dimension}}]})
The previous code sets up a basic collection with the specified name and dimensions. We also define the collection schema to store the vectors that we wish to search.
The collection schema comprises an embedding field with the float vector data type and the specified dimension.
Index Vectors
Before performing a vector search, we need to index our vectors to make them searchable. Milvus supports different indexing methods such as IVF_FLAT, IVF_SQ8, and HNSW. For this post, we use the IVF_FLAT indexing method for simplicity.
milvus.create_index(collection_name, index_param)
In this case, we specify the index type as “IVF_FLAT” with an nlist value of 4096.
Insert the Data
Next, let us insert some data into our collection. PyMilvus uses the NumPy arrays to represent the vector data.
# Generate random vectors
vectors = np.random.rand(1000, dimension).astype(np.float32)
# Insert vectors into the collection
milvus.insert(collection_name, vectors.tolist())
In the provided example, we use the np.random.rand() method to generate a batch of 1000 random vectors of the specified dimension and insert them into the collection using the insert() method.
Perform the Vector Search
We can now perform the vector search operations with the indexed and inserted vectors. To find the most similar vectors to a given query vector, we can run the code as follows:
query_vector = np.random.rand(dimension).astype(np.float32)
# Perform vector search
results = milvus.search(collection_name, query_vector.tolist(), top_k=5)
Here, we generate a random query vector and perform a search using the search() method. We specify the top_k=5 to retrieve the top 5 most similar vectors.
This should return the results object which contains the search results including the matched vectors and their distances.
Clean Up
Finally, don’t forget to clean up the resources and close the connection to the Milvus server:
milvus.close()
There you have it!
Conclusion
In this tutorial, we covered the basics of the Milvus vector search using the PyMilvus SDK. We explored the basics of connecting to the Milvus server, creating collections, adding the data, and carrying out a vector search.