AI

How to Work with Chroma DB

Chroma or ChromaDB is a relatively new embedding database to build the AI applications. It is a free and open-source embedding database with a lot of functionality to work with AI applications.

Using Chroma DB, you can perform the actions such as:

  1. Storing the embeddings data and corresponding metadata
  2. Embed documents and queries
  3. Search the embedding data

The following illustration shows the architecture of the Chroma database:

In this tutorial, we will quickly show you how you can set up a basic Chroma DB instance and use the Python SDK to create a collection and insert a sample data.

Requirements:

To follow along with this post, you need to have the following:

  1. Installed Python 3.10 on your machine. It may cause issues with Python 3.11 and above.
  2. Basic Python programming knowledge

Step 1: Install and Run Chroma

At the time of writing this tutorial, we can run Chroma DB by installing it with “pip”. It runs in-memory in alpha mode as a client-server.

Using “pip”, run the following command to install and run Chroma DB:

$ pip install chromadb

Step 2: Import the Chroma Client

Once you have the Chroma DB installed, we can use it in our Python project. The first step is importing the Chroma DB package and creating a client for the server.

import chromadb
client = chromadb.Client()

This should import the Chroma DB package and create a client that connects to the server.

Create a Collection

Once connected, we need to create a collection in the database. Think of a collection as a storage container to store the embedding.

We can accomplish this using the create_collection() method as demonstrated in the following example code:

client.create_collection(name="sample_collection")

This should create a collection under the sample_collection name.

Add a Sample Data to the Collection

The next step is to add a sample data to the created collection. We can do this using the collection.add method as demonstrated in the following example code:

collection.add(
    documents=["sample doc 1", "sample doc 2"],
    metadatas=[{"tag": "1"}, {"tag": "2"}],
    ids=["id1", "id2"]
)

This code uses the “add” method to add the documents, the corresponding metadata, and the ids to the collection.

Query the Data

Finally, we can query the collection with a list of query texts, and Chroma returns the “n” most similar results.

results = collection.query(
    query_texts=["sample"],
    n_results=2
)

This should return the top 2 most similar results.

Conclusion

In this fundamentals article, we explored how to install and use Chroma DB to create a collection, add sample documents, and query the stored embedding for the matching results.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list