AI

Pinecone Index.Query()

Pinecone is a vector database that specializes in similarity search and nearest neighbor retrieval. It uses vector embeddings to represent the data points such as documents, images, or other types of data, in a high-dimensional space.

In this post, we will learn how to query the data from a given Pinecone index using the Python client for Pinecone DB.

Requirements:

To follow along with this tutorial, ensure that you have the following:

  1. Installed Python 3.10 and above
  2. Basic Python programming knowledge

Installing the Pinecone Client

Before interacting with the Pinecone server using Python, we need to install the Pinecone client on our machine. Luckily, we can do this with a simple “pip” command as follows:

$ pip3 install pinecone-client

The previous command should download the latest stable version of the Pinecone client and install it in your project.

Creating an Index and Upsert

Once we install the Pinecone client, we can create an index to store the vector data.

We can do this using the create_index() method as shown in the following example code:

import numpy as np
import pinecone

pinecone.init(api_key="0f57b6af-ea59-4fd3-a0ce-3c7f0c1d419f", environment="us-west1-gcp-free")
pinecone.create_index("sample-index", dimension=8)

# Create two sets of 8-dimensional vectors
vectors_a = np.random.rand(15, 8).tolist()
vectors_b = np.random.rand(20, 8).tolist()

index = pinecone.Index("sample-index")

# Create ids
ids_a = map(str, np.arange(15).tolist())
ids_b = map(str, np.arange(20).tolist())

# Insert into separate namespaces
index.upsert(vectors=zip(ids_a,vectors_a),namespace='namespace_a')
index.upsert(vectors=zip(ids_b,vectors_b),namespace='namespace_b')

The previous code starts by initializing Pinecone. It then creates a basic index named “sample-index” with specified parameters and establishes a connection to the index.

Next, we generate two sets of 8-dimensional vectors and the corresponding IDs.

Next, we create the IDs for the vector pairs which we use to identify the vectors.

Finally, it inserts the vectors into the “namespace_a” and “namespace_b.”

The previous code should add the vectors into their corresponding namespaces as specified in the upsert() function.

Pinecone Query by Namespace

Once the data is stored in the index, we can use the query() method to search a namespace using a query vector. The method retrieves the IDs of the most similar items in a namespace and their similarity scores.

The function syntax is as follows:

Index.query(**kwargs)

The function accepts the following parameters:

  1. Namespace – It sets the namespace to query.
  2. Top_k – It sets the number of results to return for each query.
  3. Filter – It defines the filter to apply to the vectors. Check our tutorial on vector metadata to learn more.
  4. include_values – It denotes whether the vector values are included in the response.
  5. Include_metadata – It denotes whether the metadata is included in the response and the IDs.
  6. Vector – It specifies the query vector.
  7. Sparse_vector – It specifies the sparse query vector.
  8. Id – It specifies the unique ID of the vector to be used as a query vector.

Pinecone Index.query() Usage

The following shows how to use the query() function to return the top 10 matching vectors from namespace_b:

query_response = index.query(

namespace='namespace_b',

top_k=10,

include_values=True,

include_metadata=True,

vector=[0.02, 0.99, 0.80, 0.54, 0.20, 0.12, 0.84, 0.56]

)

print(query_response)

As you can guess, this query should return the top 10 matching vectors from the specified namespace.

An example output is as follows:

Conclusion

You learned how to utilize the query() method that is provided by the Pinecone client for Python to find the similarly matching vectors in a given namespace.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list