One usage of vector metadata is vector filtering. For example, you can specify a filter expression when querying a Pinecone index which filters only the vectors that match that metadata.
Such filter expressions allow to limit the vector searches based on specific metadata criteria. We can retrieve only the nearest-neighbor results that match the specified metadata filters by including the filter expressions.
This capability enables more precise and targeted searches as we can leverage the metadata to narrow the search space and retrieve the results that meet a specific criteria.
This tutorial teaches us how to delete a vector based on a given metadata.
Requirements:
To follow along with this tutorial, ensure that you have the following:
- Installed Python 3.10 and above
- Basic Python programming knowledge
Installing the Pinecone Client
Before interacting with the Pinecone server using Python, we need to install the Pinecone client on our machine. Luckily, we can do this with a simple “pip” command as follows:
The previous command should download the latest stable version of the Pinecone client and install it in your project.
Creating a Sample Index
The first step is to set up a basic index which we will use for demonstration purposes. In this case, we create a basic index that stores the book information.
# init pinecone configuration
pinecone.init(api_key="YOUR_API_KEY", environment="YOUR_ENV")
# create basic index
pinecone.create_index("book", dimension=8, metric="euclidean", pod_type="p1", pods=1, replicas=1)
# connect to the index
index = pinecone.Index("book")
The previous code initializes the Pinecone instance and creates a book index with a dimension of 8.
Inserting Vectors with Metadata
Once we have an index created, we can use the upsert operation to insert the vectors with metadata as shown in the following example code:
("A", [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1], {"genre": "comedy", "year": 2020, "title": "Book A"}),
("B", [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2], {"genre": "mystery", "year": 2019, "title": "Book B"}),
("C", [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3], {"genre": "comedy", "year": 2019, "title": "Book C"}),
("D", [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4], {"genre": "drama", "title": "Book D"}),
("E", [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5], {"genre": "romance", "title": "Book E"})
])
The previous code inserts the vectors that represent the book information. It also includes metadata such as the genre, year, and title.
Filtering with Metadata
Once we insert the vectors with metadata, we can use this information to perform the granular filtering as shown in the following example code:
vector=[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
filter={
"genre": {"$eq": "comedy"},
"year": 2020
},
The previous code should query the index and return only the vectors where the genre is equal to comedy and the year is equal to 2020.
The supported metadata filters are as follows:
You can combine the metadata filters using the AND and OR operators.
- $eq – Equal to (number, string, boolean)
- $ne – Not equal to (number, string, boolean)
- $gt – Greater than (number)
- $gte – Greater than or equal to (number)
- $lt – Less than (number)
- $lte – Less than or equal to (number)
- $in – In array (string or number)
- $nin – Not in the array (string or number)
Deleting Vectors by Metadata Filter
We can also sass a metadata filter expression to delete the vectors that match the specified conditions.
An example is as follows:
filter={
"genre": {"$eq": "comedy"},
"year": 2020
}
)
As you can guess, the previous code should remove the index whose metadata matches the specified genre and year.
Conclusion
We learned how we can work with vector metadata in Pinecone including how to delete the vectors based on a matching metadata filter.