AI

PyMilvus Drop_Index()

An index in Milvus is essentially a data structure that speeds up the search and query process within the vector data. Milvus can organize the vector data by creating an index for more efficient retrieval.

Milvus supports various types of indexes as outlined in the following:

  • Flat (FLAT)
  • Hierarchical Navigable Small World (HNSW)
  • Inverted File System (IVF_FLAT)
  • Others

In this tutorial, we will learn how to leverage the create_index() and drop_index() functions to create and drop the collection indexes in Milvus.

Prerequisites:

Before proceeding, ensure that you have the following tools installed:

  1. Python 3.10 and above
  2. PyMilvus
  3. Milvus server
  4. Milvus CLI

Create a Collection in Milvus

The first step is to create the collection on which we wish to create an index. This tutorial uses the “create collection” command in the Milvus CLI.

Once logged into the target server, run the command to create the collection as demonstrated in the following example:

milvus_cli > create collection -c film -f id:INT64:primary_field -f film_name:VARCHAR:100 -f release_year:INT64:release_year -f vector:FLOAT_VECTOR:8 -p id -d 'fiml_collection'

We call the “create collection” command to create a new collection in Milvus.

Next, we use the -c parameter to specify the collection’s name. In this case, the collection name is “film.”

The -f parameter allows us to specify the fields of the collection and their corresponding types within the collection. In this collection, the fields are defined as follows:

id:INT64:primary_field – This specifies the field named “id” of type INT64 and is marked as the primary field. The primary field is used to uniquely identify the records in the collection.

film_name VARCHAR:100 – This specifies the field named “film_name” of type VARCHAR with a maximum length of 100 characters.

release_year:INT64:release_year – This specifies the field named “release_year” of type INT64.

vector:FLOAT_VECTOR:8 – This specifies the field named “vector” as a vector field of type FLOAT_VECTOR with a dimensionality of 8.

The -p id parameter allows us to specify the primary key for the collection. In this case, we set the primary key as the “id” field.

PyMilvus Create_Index() Function

Once configuring the target collection, we can learn how to use the create_index() function to setup a collection index.

The function syntax is as follows:

create_index(field_name, index_params, timeout=None, **kwargs)

The function accepts the parameters as outlined in the following:

field_name – This parameter specifies the field’s name on which to create the target index.

index_paramers – This is a dictionary that defines the parameters of the target index such as the index type, etc.

index_name – This parameter sets the index name that you wish to create. If it is not specified, the function creates an index with the name format of _default_idx_.

timeout – The timeout parameter defines the duration in seconds to allow for the RPC call.

Function Return Value

The function returns a newly created index object.

Example Usage:

The following sample code demonstrates how we can use the create_index() function to create an IVF-FLAT-type index on the film collection.

from pymilvus import Collection
index_params = {
  "metric_type":"L2",
  "index_type":"IVF_FLAT",
  "params":{"nlist":1024}
}

collection = Collection("film")
collection.create_index(
  field_name="vector",
  index_params=index_params,
  index_name="vector_idx"
)

We import the “collection” class from the PyMilvus module in the given example code. This class allows us to interact with collections in the Milvus database.

Next, we define the index_params dictionary which specifies the parameters for the index that we wish to create. In this case, we set the metric_type to L2 which means that we calculate the Euclidean distance for vector distance.

We also set the index_type as IVF_FLAT, the Inverted File system index. We also set the number of clusters (nlist) to 1024.

In the next step, we create a collection instance that allows us to access the “film” collection in the database.

Finally, we use the create_index function to create an index on the vector field with the specified index parameters.

PyMilvus Drop_Index() Function

The drop_index() method allows you to drop the index and its corresponding index file in the collection.

The function syntax is as follows:

drop_index(timeout=None, **kwargs)

The function parameters are expressed in the following:

index_name – This specifies the name of the index that you wish to remove. If it is not specified, the default value of index_name is “_default_idx_.”

timeout – An optional duration of time in seconds to allow for the RPC. If it is set to None, the client waits until the server responds or an error occurs.

The function does not have a return value.

The following example demonstrates how to use this function to remove the vector_idx that we created earlier.

from pymilvus import Collection
collection = Collection("film")
collection.drop_index(index_name="vector_idx")

The previous code removes the index with the “vector_idx” name from the “film” collection.

Conclusion

In this tutorial, we learned how to leverage the create_index() and the drop_index() functions from the PyMilvus package to create and drop the collection index in the Milvus database.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list