Pinecone Create_Collection()

A Pinecone collection is a logical grouping of similar vectors or embeddings. It represents a dataset or a set of objects that you wish to search or retrieve based on a given matching criteria. Consider the collections as a logical unit that allows you to organize and manage the data efficiently by grouping the related vectors from a given index.

On the other hand, an index refers to the underlying data structure that enables fast and efficient similarity searches within a given collection.

The index structure allows for quick retrieval of similar vectors based on their proximity in the vector space which enables high-performance similarity search even with large-scale datasets.

In this tutorial, we will learn how to use the create_collection() method from the Pinecone client for Python to set up a collection for a given index.

Requirements:

To follow along with this tutorial, ensure that you have the following:

Installed Python 3.10 and above
Basic Python programming knowledge

Installing the Pinecone Client

Before interacting with the Pinecone server using Python, we need to install the Pinecone client on our machine. Luckily, we can do this with a simple “pip” command as follows:

$ pip3 install pinecone-client

The previous command should download the latest stable version of the Pinecone client and install it in your project.

Creating a Sample Index

The first step is to set up a basic index which we will use for demonstration purposes. In this case, we create a basic index that stores the book information.

import pinecone

# init pinecone configuration

pinecone.init(api_key="YOUR_API_KEY", environment="YOUR_ENV")

# create basic index

pinecone.create_index("book", dimension=8, metric="euclidean", pod_type="p1", pods=1, replicas=1)

# connect to the index

index = pinecone.Index("book")

The previous code initializes the Pinecone instance and creates a book index with a dimension of 8.

Inserting Vectors with Metadata

Once we have an index created, we can use the upsert operation to insert the vectors with metadata as shown in the following example code:

index.upsert([
("A", [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1], {"genre": "comedy", "year": 2020, "title": "Book A"}),
("B", [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2], {"genre": "mystery", "year": 2019, "title": "Book B"}),
("C", [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3], {"genre": "comedy", "year": 2019, "title": "Book C"}),
("D", [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4], {"genre": "drama", "title": "Book D"}),
("E", [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5], {"genre": "romance", "title": "Book E"})
])

The previous code inserts the vectors that represent the book information. It also includes the metadata such as the genre, year, and title.

Pinecone Create_Index() Method

As you can guess, the create_index() method allows us to create a collection from a given index. The function syntax is as follows:

pinecone.create_collection(**kwargs)

The function accepts two main parameters:

Name – This specifies the name of the collection that you wish to create.
Source – It defines the source index from which you wish to create the collection.

Example Function Usage

To create a collection from the book index that we created in the previous example, we can use the create_collection() method as demonstrated in the following code:

import pinecone

pinecone.init(api_key='YOUR_API_KEY', environment='YOUR_ENVIRONMENT')

pinecone.create_collection("book-collection","book")

This should create a collection called “book-collection” based on the specified book index.

Conclusion

You learned how to use the create_index() method to create a collection from an existing Pinecone index. You can check the documentation or source code to learn more about this function.