One usage of vector metadata is vector filtering. For example, you can specify a filter expression when querying a Pinecone index which filters only the vectors that match that metadata.
Such filter expressions allow to limit the vector searches based on specific metadata criteria. We can retrieve only the nearest-neighbor results that match the specified metadata filters by including the filter expressions.
This capability enables more precise and targeted searches as we can leverage the metadata to narrow the search space and retrieve the results that meet a specific criteria.
This tutorial teaches us how to insert the metadata in a given index. We will also learn how to use these indexes to perform the metadata filtering for more granular searches.
Requirements:
To follow along with this tutorial, ensure that you have the following:
- Installed Python 3.10 and above
- Basic Python programming knowledge
Installing the Pinecone Client
Before interacting with the Pinecone server using Python, we need to install the Pinecone client on our machine. Luckily, we can do this with a simple “pip” command as follows:
The previous command should download the latest stable version of the Pinecone client and install it in your project.
Creating a Sample Index
The first step is to set up a basic index which we will use for demonstration purposes. In this case, we create a basic index that stores the book information.
# init pinecone configuration
pinecone.init(api_key="YOUR_API_KEY", environment="YOUR_ENV")
# create basic index
pinecone.create_index("book", dimension=8, metric="euclidean", pod_type="p1", pods=1, replicas=1)
# connect to the index
index = pinecone.Index("book")
The previous code initializes the Pinecone instance and creates a book index with a dimension of 8.
Inserting Vectors with Metadata
Once we have an index created, we can use the upsert operation to insert the vectors with metadata as shown in the following example code:
("A", [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1], {"genre": "comedy", "year": 2020, "title": "Book A"}),
("B", [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2], {"genre": "mystery", "year": 2019, "title": "Book B"}),
("C", [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3], {"genre": "comedy", "year": 2019, "title": "Book C"}),
("D", [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4], {"genre": "drama", "title": "Book D"}),
("E", [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5], {"genre": "romance", "title": "Book E"})
])
The previous code inserts the vectors that represent the book information. It also includes the metadata such as the genre, year, and title.
Filtering with Metadata
Once we insert the vectors with metadata, we can use this information to perform the granular filtering as shown in the following example code:
vector=[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
filter={
"genre": {"$eq": "comedy"},
"year": 2020
},
The previous code should query the index and return only the vectors where the genre is equal to comedy and the year is equal to 2020.
The supported metadata filters are as follows:
You can combine the metadata filters using the AND and OR operators.
- $eq – Equal to (number, string, boolean)
- $ne – Not equal to (number, string, boolean)
- $gt – Greater than (number)
- $gte – Greater than or equal to (number)
- $lt – Less than (number)
- $lte – Less than or equal to (number)
- $in – In array (string or number)
- $nin – Not in the array (string or number)
There you have it!
Conclusion
You learned about the concept of vector metadata in Pinecone, how to insert the vectors with metadata, and how to use the vector metadata to filter the vectors that match the specified criteria.