One of the major building blocks in Pinecone is an index. A pinecone index refers to a structure that is used by Pinecone to index and organize the vectors that are stored within that database. An index employs the advanced techniques such as approximate nearest neighbor algorithms, dimensionality reduction for search efficiency, and more.
Using an index in Pinecone facilitates rapid similarity searches by quickly identifying the vectors that are closest to a given query vector which facilitates an efficient information retrieval.
In this tutorial, we will walk you through the basics of working with indexes in a Pinecone. We will start by creating a basic index, adding sample data, and gathering information about the index.
NOTE: This tutorial demonstrates how to work with the Pinecone database using the Pinecone client for the Python programming language.
We also assume that you have a basic Pinecone project setup for demonstration purposes. You can create one by checking the Pinecone cloud console.
Install the Pinecone Client
The first step is to ensure that you have the Pinecone client for Python that is installed on your system. You can do this by running the “pip” command that is provided in the following:
This should download the latest stable version of the Pinecone-client package and install it on your machine.
Create an Index in Pinecone
As you can guess, the Pinecone-client package provides the create_index() method that allows us to create a new index in the Pinecone cluster.
The method syntax is as follows:
The function parameters are described as follows:
- name – This specifies the name of the index that you wish to create. The maximum length of the index is set to 45 characters.
- dimension – This specifies the dimension of the vectors to be inserted in the index.
- metric – This is an optional parameter that specifies the distance metric that is used in the vector similarity search. Accepted distances include “euclidean”, “cosine”, and “dotproduct”.
- pods – This is an optional integer parameter that defines the number of pods that the index will use. This value includes the number of replicas.
- replicas – This sets the number of replicas that are used by the index.
- pod_type – This parameter sets the type of pods for the index. The supported pod types include “s1”, “p1”, “p2”.
- metadata_config – This specifies the configuration for the behavior of Pinecone’s internal metadata index.
- source_collection – This sets the name of the collection from which to create the index.
Example Usage
Example 1: Create a Basic Index
The following example demonstrates how to use the create_index() function to create a basic index:
# init pinecone configuration
pinecone.init(api_key="YOUR_API_KEY", environment="YOUR_ENVIRONMENT"
# create basic index
pinecone.create_index("sample", dimension=8)
The previous code creates an index called “sample” with the dimension of 8.
Example 2: Specifying Other Parameters
We can also create an index with a specific metric and pod type as shown in the following example code:
In this example, we create an index called “sample” with the dimension of 1024, the metric type as Euclidean distance, the pod type of p1, one pod, and one replica.
Example 3: Setting the Metadata Config
We can also specify the metadata configuration as demonstrated in the following example code:
'indexed': ['A']
}
pinecone.create_index('sample-index-2, dimension=1024,
metadata_config=metadata_config)
The previously provided example creates an index that only indexes the “A” metadata field.
List the Indexes in Pinecone
Once done, you can use the list_indexes() method to list all the indexes in the server as demonstrated in the following:
pinecone.init(api_key='YOUR_API_KEY', environment='YOUR_ENVIRONMENT')
active_indexes = pinecone.list_indexes()
This should return the available indexes in your cluster.
Conclusion
You learned the fundamentals of working with Pinecone indexes using the Pinecone client for the Python programming language.