AI Python Pandas

Milvus Insert Entities from Files

Milvus is an open-source vector database that enables the efficient storage and retrieval of high-dimensional vectors.

When working with Milvus, you often need to load the data from an external source, especially when dealing with bulk uploads. This tutorial demonstrates how to insert the data from an external file using Milvus and the PyMilvus SDK.

Requirements:

  • Milvus: Python SDK for Milvus
  • NumPy: For handling the numeric data
  • Pandas: For reading and processing the data from files

You can install these packages using pip:

pip install pymilvus numpy pandas

Import the Required Packages

The first step is to import the required packages.

import numpy as np

import pandas as pd

from pymilvus import connections, Collection

Connect to Milvus

Once we import the target packages, we can connect to the Milvus server.

# Define Milvus server IP address and port

MILVUS_HOST = '127.0.0.1'

MILVUS_PORT = '19530'

# Establish connection to Milvus server

connections.connect(host=MILVUS_HOST, port=MILVUS_PORT)

Create a Milvus Collection

Once connected, we can create a collection on the Milvus server as shown in the following code:

from pymilvus import Collection, connections, DataType

# Define collection parameters
COLLECTION_NAME = 'my_collection'
DIMENSION = 128 # Dimensionality of vectors

# Establish connection to Milvus server
connections.connect(host=MILVUS_HOST, port=MILVUS_PORT)

# Create collection
collection = Collection(name=COLLECTION_NAME)
collection.create(
fields=[
{"name": "vector", "type": DataType.FLOAT_VECTOR, "params": {"dim": DIMENSION}}
],
file_format="csv"
)

Prepare the Data

The next step is to prepare the data that you wish to insert. In this case, we use a simple CSV file as shown in the following:

id,feature_1,feature_2,feature_3,feature_4
1,0.25,0.35,0.15,0.45
2,0.55,0.12,0.76,0.62
3,0.81,0.25,0.34,0.48
4,0.92,0.61,0.72,0.19
5,0.38,0.92,0.55,0.76

Read the Data from the Files

The next step is to read the data that is stored in the CSV file. In this case, we assume that the data is stored in a file called “features.csv”.

import pandas as pd

# Read data from CSV file
FILE_PATH ='/features.csv'
data_frame = pd.read_csv(FILE_PATH)

# Extract vectors from the data frame
vectors = data_frame.iloc[:, 1:].values

Convert and Insert the Entities into Milvus

Once we read the file, we need to convert the input vectors to a NumPy array. We can do this by running the code as follows:

vectors = np.array(vectors, dtype=np.float32)

# Insert entities into Milvus

entities = [

{'name': str(i), 'vector': vector.tolist()}

for i, vector in enumerate(vectors)

]

milvus_client.insert(collection_name=COLLECTION_NAME, records=entities)

In the given example, we convert the vectors to a NumPy array and then create a list of entities that we can insert into Milvus.

Each entity contains a name (in this case, we use the vector index as the name) and the vector itself.

Verify the Entity

Once completed, we can verify whether the data is added successfully using the num_entities() method.

collection = Collection(name=COLLECTION_NAME)

num_entities = collection.num_entities

print(f"Number of entities in the collection: {num_entities}")

This prints the number of inserted entities.

Make sure to close the connection when you’re done working with Milvus.

Conclusion

That’s it! You have successfully inserted the entities from files into Milvus using the PyMilvus SDK. You can now perform the similarity searches or other operations on the vectors that are stored in Milvus.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list