When working with Milvus, you often need to load the data from an external source, especially when dealing with bulk uploads. This tutorial demonstrates how to insert the data from an external file using Milvus and the PyMilvus SDK.
Requirements:
- Milvus: Python SDK for Milvus
- NumPy: For handling the numeric data
- Pandas: For reading and processing the data from files
You can install these packages using pip:
Import the Required Packages
The first step is to import the required packages.
import pandas as pd
from pymilvus import connections, Collection
Connect to Milvus
Once we import the target packages, we can connect to the Milvus server.
MILVUS_HOST = '127.0.0.1'
MILVUS_PORT = '19530'
# Establish connection to Milvus server
connections.connect(host=MILVUS_HOST, port=MILVUS_PORT)
Create a Milvus Collection
Once connected, we can create a collection on the Milvus server as shown in the following code:
# Define collection parameters
COLLECTION_NAME = 'my_collection'
DIMENSION = 128 # Dimensionality of vectors
# Establish connection to Milvus server
connections.connect(host=MILVUS_HOST, port=MILVUS_PORT)
# Create collection
collection = Collection(name=COLLECTION_NAME)
collection.create(
fields=[
{"name": "vector", "type": DataType.FLOAT_VECTOR, "params": {"dim": DIMENSION}}
],
file_format="csv"
)
Prepare the Data
The next step is to prepare the data that you wish to insert. In this case, we use a simple CSV file as shown in the following:
1,0.25,0.35,0.15,0.45
2,0.55,0.12,0.76,0.62
3,0.81,0.25,0.34,0.48
4,0.92,0.61,0.72,0.19
5,0.38,0.92,0.55,0.76
Read the Data from the Files
The next step is to read the data that is stored in the CSV file. In this case, we assume that the data is stored in a file called “features.csv”.
# Read data from CSV file
FILE_PATH ='/features.csv'
data_frame = pd.read_csv(FILE_PATH)
# Extract vectors from the data frame
vectors = data_frame.iloc[:, 1:].values
Convert and Insert the Entities into Milvus
Once we read the file, we need to convert the input vectors to a NumPy array. We can do this by running the code as follows:
# Insert entities into Milvus
entities = [
{'name': str(i), 'vector': vector.tolist()}
for i, vector in enumerate(vectors)
]
milvus_client.insert(collection_name=COLLECTION_NAME, records=entities)
In the given example, we convert the vectors to a NumPy array and then create a list of entities that we can insert into Milvus.
Each entity contains a name (in this case, we use the vector index as the name) and the vector itself.
Verify the Entity
Once completed, we can verify whether the data is added successfully using the num_entities() method.
num_entities = collection.num_entities
print(f"Number of entities in the collection: {num_entities}")
This prints the number of inserted entities.
Make sure to close the connection when you’re done working with Milvus.
Conclusion
That’s it! You have successfully inserted the entities from files into Milvus using the PyMilvus SDK. You can now perform the similarity searches or other operations on the vectors that are stored in Milvus.