AI

Pinecone Index.Upsert()

In Pinecone, an index refers to a high-performance data structure that enables an efficient similarity search and retrieval of vector embeddings.

It works by organizing the embeddings to optimize the nearest-neighbor queries which allows you to find the most similar vectors quickly. As such, a Pinecone index can handle the large-scale vector datasets with incredibly swift insert, update, and delete operations while maintaining a high-search efficiency.

What Is Upsert in Pinecone?

In Pinecone, an upsert is an operation that allows us to update or insert the vector embeddings into an existing index. It combines updating the current vectors and inserting new vectors in a single process.

When you invoke an upsert operation, the Pinecone engine checks if the specified vector already exists on the index. It performs an update operation with the new vector values if it does. Otherwise, if the vector does not exist, it creates a new one in the index.

The upsert functionality is useful when you have a dynamic dataset and you need to continuously update or add new vectors without rebuilding the entire index.

Due to such functionality, it enables support for real-time updates to the index while allowing synchronized changes in the data. This can be particularly valuable in applications such as recommendation systems where the user preferences or item features change over time.

In this tutorial, we will learn how to carry out an upsert operation in Pinecone using the Pinecone client for Python.

Requirements:

To follow along with this tutorial, ensure that you have the following:

  1. Installed Python 3.10 and above
  2. Basic Python programming knowledge

Installing the Pinecone Client

Before interacting with the Pinecone server using Python, we need to install the Pinecone client on our machine. Luckily, we can do this with a simple “pip” command as follows:

$ pip3 install pinecone-client

The previous command should download the latest stable version of the Pinecone client and install it in your project.

Creating an Index and Upsert

Once we install the Pinecone client, we can create an index where we store the vector data.

We can do this using the create_index() method as shown in the following example code:

import pinecone

import numpy as np

# init pinecone configuration

pinecone.init(api_key="0f57b6af-ea59-4fd3-a0ce-3c7f0c1d419f", environment="us-west1-gcp-free")

# create basic index

pinecone.create_index("sample", dimension=1024, metric="euclidean", pod_type="p1", pods=1, replicas=1)

# connect to the index

index = pinecone.Index("sample")

# Create three sets of 8-dimensional vectors

vectors_a = np.random.rand(15, 8).tolist()

# Create ids

ids_a = map(str, np.arange(15).tolist())

# Insert into separate namespaces

index.upsert(vectors=zip(ids_a,vectors_a),namespace='linuxhint_namespace_a')

The previous code starts by initializing Pinecone. It then creates a basic index named “sample” with specified parameters and establishes a connection to the index.

Next, we generate a set of 15-dimensional vectors and corresponding IDs.

Finally, it inserts the vectors into the “linuxhint_namespace_a” namespace of the “sample” index using the upsert functionality.

Index.Upsert() Method

The index.upsert() method allows us to update or insert the vectors into a given index as demonstrated in the previous example.

The method syntax is as follows:

Index.upsert(**kwargs)

The method accepts two main parameters:

  1. Vectors – This parameter defines an array that contains the vectors to upsert. It is recommended to insert a batch of 100 items (max). The array includes the following parameters:
    • id – It specifies the unique ID of the vector.
    • values – It represents the vector data.
    • metadata – It specifies the metadata for the vector.
    • sparse_vector – This defines a dictionary that contains the index and the values arrays that contain the sparse vector values.
  2. Namespace – The namespace name to upsert the vectors.

The method returns an int64 data type which denotes the number of vectors that are upserted in operation.

Conclusion

You learned what a Pinecone index is, what is upsert operation, and how to use the Pinecone client for Python to perform an upsert operation.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list