On the other hand, an index refers to the underlying data structure that enables fast and efficient similarity searches within a given collection.
The index structure allows for quick retrieval of similar vectors based on their proximity in the vector space which enables high-performance similarity search even with large-scale datasets.
In this tutorial, we will learn how to use the create_collection() method from the Pinecone client for Python to set up a collection for a given index.
Requirements:
To follow along with this tutorial, ensure that you have the following:
- Installed Python 3.10 and above
- Basic Python programming knowledge
Installing the Pinecone Client
Before interacting with the Pinecone server using Python, we need to install the Pinecone client on our machine. Luckily, we can do this with a simple “pip” command as follows:
The previous command should download the latest stable version of the Pinecone client and install it in your project.
Creating a Sample Index
The first step is to set up a basic index which we will use for demonstration purposes. In this case, we create a basic index that stores the book information.
# init pinecone configuration
pinecone.init(api_key="YOUR_API_KEY", environment="YOUR_ENV")
# create basic index
pinecone.create_index("book", dimension=8, metric="euclidean", pod_type="p1", pods=1, replicas=1)
# connect to the index
index = pinecone.Index("book")
The previous code initializes the Pinecone instance and creates a book index with a dimension of 8.
Inserting Vectors with Metadata
Once we have an index created, we can use the upsert operation to insert the vectors with metadata as shown in the following example code:
("A", [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1], {"genre": "comedy", "year": 2020, "title": "Book A"}),
("B", [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2], {"genre": "mystery", "year": 2019, "title": "Book B"}),
("C", [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3], {"genre": "comedy", "year": 2019, "title": "Book C"}),
("D", [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4], {"genre": "drama", "title": "Book D"}),
("E", [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5], {"genre": "romance", "title": "Book E"})
])
The previous code inserts the vectors that represent the book information. It also includes the metadata such as the genre, year, and title.
Pinecone Create_Index() Method
As you can guess, the create_index() method allows us to create a collection from a given index. The function syntax is as follows:
The function accepts two main parameters:
- Name – This specifies the name of the collection that you wish to create.
- Source – It defines the source index from which you wish to create the collection.
Example Function Usage
To create a collection from the book index that we created in the previous example, we can use the create_collection() method as demonstrated in the following code:
pinecone.init(api_key='YOUR_API_KEY', environment='YOUR_ENVIRONMENT')
pinecone.create_collection("book-collection","book")
This should create a collection called “book-collection” based on the specified book index.
Conclusion
You learned how to use the create_index() method to create a collection from an existing Pinecone index. You can check the documentation or source code to learn more about this function.