AI

Pinecone Query by Namespace

A namespace in Pinecone represents a distinct collection or dataset of vector embeddings that share similar characteristics or belong to a specific domain. It acts as a boundary for organizing and managing the vectors which allows the efficient retrieval and search operations within that dataset.

Each namespace has its unique identifier and can have its configuration settings such as the dimensionality of the vectors, the similarity metric that is used for comparisons, and indexing options. We can separate and handle the different data sets independently by utilizing the namespaces and tailoring the settings and behavior to their specific needs.

This tutorial shows you how to create multiple namespaces to a given index, insert the data into multiple namespaces, and use the query function to search the vector data from a specific namespace.

Requirements:

To follow along with this tutorial, ensure that you have the following:

  1. Installed Python 3.10 and above
  2. Basic Python programming knowledge

Installing the Pinecone Client

Before interacting with the Pinecone server using Python, we need to install the Pinecone client on our machine. Luckily, we can do this with a simple “pip” command as follows:

$ pip3 install pinecone-client

The given command should download the latest stable version of the Pinecone client and install it in your project.

Creating an Index and Upsert

Once we install the Pinecone client, we can create an index to store the vector data.

We can do this using the create_index() method as shown in the following example code:

import numpy as np
import pinecone

pinecone.init(api_key="0f57b6af-ea59-4fd3-a0ce-3c7f0c1d419f", environment="us-west1-gcp-free")
pinecone.create_index("sample-index", dimension=8)

# Create two sets of 8-dimensional vectors
vectors_a = np.random.rand(15, 8).tolist()
vectors_b = np.random.rand(20, 8).tolist()

index = pinecone.Index("sample-index")

# Create ids
ids_a = map(str, np.arange(15).tolist())
ids_b = map(str, np.arange(20).tolist())

# Insert into separate namespaces
index.upsert(vectors=zip(ids_a,vectors_a),namespace='namespace_a')
index.upsert(vectors=zip(ids_b,vectors_b),namespace='namespace_b')

The given code starts by initializing Pinecone. It then creates a basic index named “sample-index” with specified parameters and establishes a connection to the index.

Next, we generate two sets of set 8-dimensional vectors and corresponding IDs.

Next, we create the IDs for the vector pairs which we will use to identify the vectors.

Finally, it inserts the vectors into the “namespace_a” and “namespace_b”.

The previous code should add the vectors into their corresponding namespaces as specified in the upsert() function.

Pinecone Query by Namespace

Once the data is stored in the index, we can use the query() method to search a namespace using a query vector. The method retrieves the IDs of the most similar items in a namespace and their similarity scores.

The function syntax is as follows:

Index.query(**kwargs)

The function accepts the following parameters:

  1. Namespace – It sets the namespace to query.
  2. Top_k – It sets the number of results to return for each query.
  3. Filter – It defines the filter to apply to the vectors. Check our tutorial on the vector metadata to learn more.
  4. include_values – It denotes whether the vector values are included in the response.
  5. Include_metadata – It denotes whether the metadata is included in the response and the IDs.
  6. Vector – It specifies the query vector.
  7. Sparse_vector – It specifies the sparse query vector.
  8. Id – It specifies the unique ID of the vector to be used as a query vector.

Example of Pinecone Query by Namespace

The following code shows an example of using the query() function to search for the top matching vectors in a given namespace:

query_response = index.query(

namespace='namespace_a',

top_k=1,

include_values=True,

include_metadata=True,

vector=[0.02, 0.99, 0.80, 0.54, 0.20, 0.12, 0.84, 0.56]

)

print(query_response)

The given query searches for the top 1 matching vectors in the “namespace_a” namespace and includes the values and metadata.

An example output is as follows:

{'matches': [{'id': '0',
'score': 0.854903579,
'values': [0.0537401401,
0.621147,
0.728396,
0.972351491,
0.181928158,
0.0169850122,
0.271920711,
0.179941893]}],
'namespace': 'namespace_a'}

As you can see, the query returns the matching records from the “namespace_a” namespace.

Conclusion

We learned how to work with indexes and namespaces in Pinecone. We also learned how to query the data from a given namespace using the query() function.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list