AI

PyMilvus Create_Index()

Milvus is an open-source vector database that enables the similarity search and analytics against the large-scale vector data. One of the critical features of Milvus is its support for multiple types of indexing algorithms to optimize the vector search.

An index in Milvus is essentially a data structure that speeds up the search and query process within the vector data. By creating an index, Milvus can organize the vector data to enable a more efficient retrieval.

Milvus supports the various types of indexes as outlined in the following:

  • Flat (FLAT)
  • Hierarchical Navigable Small World (HNSW)
  • Inverted File System (IVF)
  • Binary Inverted File (BIN_IVF)
  • Projection (PQ)
  • Random Projection (RP)
  • Scalar Quantization (SQ)

In this tutorial, we will learn how to leverage the create_index() function that is provided by the PyMilvus package in the Python ecosystem to create various types of Milvus indexes.

Prerequisites:

Before proceeding, ensure that you have the following tools installed:

  1. Python 3.10 and above
  2. PyMilvus
  3. Milvus server
  4. Milvus CLI

Create a Collection in Milvus

The first step is to create the collection on which we wish to create an index. We use the “create collection” command in the Milvus CLI for this tutorial.

Once logged into the target server, run the command to create the collection as demonstrated in the following example:

milvus_cli > create collection -c film -f id:INT64:primary_field -f film_name:VARCHAR:100 -f release_year:INT64:release_year -f vector:FLOAT_VECTOR:8 -p id -d 'fiml_collection'

We call the “create collection” command to create a new collection in Milvus.

Next, we use the -c parameter to specify the collection’s name. In this case, the collection name is “film”.

The -f parameter allows us to specify the fields of the collection and their corresponding types within the collection. In this collection, the fields are defined as follows:

id:INT64:primary_field – This specifies the field named “id” of type INT64, and it is marked as the primary field. The primary field is used to uniquely identify the records in the collection.

film_name VARCHAR:100 – This specifies the field named “film_name” of type VARCHAR with a maximum length of 100 characters.

release_year:INT64:release_year – This specifies the field named “release_year” of type INT64.

vector:FLOAT_VECTOR:8 – This specifies the field named “vector” as a vector field of type FLOAT_VECTOR with a dimensionality of 8.

The -p id parameter allows us to specify the primary key for the collection. In this case, we specify the primary key as the “id” field.

PyMilvus Create_Index() Function

Once you configured the target collection, we can learn how to use the create_index() function to setup a collection index.

The function syntax is as follows:

create_index(field_name, index_params, timeout=None, **kwargs)

The function accepts the parameters as outlined in the following:

field_name – This parameter specifies the field’s name on which to create the target index.

index_paramers – This is a dictionary that defines the parameters of the target index such as the index type, etc.

index_name – This parameter sets the index name that you wish to create. If it is not specified, the function creates an index with the name format of _default_idx_.

timeout – The timeout parameter defines the duration in seconds to allow for the RPC call.

Function Return Value

The function returns a newly created index object.

Example Usage:

The following sample code demonstrates how we can use the create_index() function to create an IVF-FLAT-type index on the film collection:

from pymilvus import Collection
index_params = {
"metric_type":"L2",
"index_type":"IVF_FLAT",
"params":{"nlist":1024}
}

collection = Collection("film")
collection.create_index(
field_name="vector",
index_params=index_params,
index_name="vector_idx"
)

We import the “collection” class from the PyMilvus module in the previous example code. This class allows us to interact with the collections in the Milvus database.

Next, we define the index_params dictionary which specifies the parameters for the index that we wish to create. In this case, we set the metric_type to L2 which means that we calculate the Euclidean distance for vector distance.

We also set the index_type as IVF_FLAT, the Inverted File system index. We also set the number of clusters (nlist) to 1024.

In the next step, we create a collection instance that allows us to access the film collection in the database.

Finally, we use the create_index function to create an index on the vector field with the specified index parameters.

Conclusion

We learned how to use the create_index() function to setup an index for a Milvus collection using Python.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list