AI

Create a Backup in Weaviate

As database administrators, performing regular backups of your data is an essential component. Even in vector databases such as Weaviate, performing backups is necessary as it allows to recover the data in case of data loss.

In this tutorial, we will learn how to enable and run a backup operation in Weaviate using the API endpoints.

Enabling the Weaviate Backup Modules

To take backups in Weaviate, we must enable the backup provider modules. Although you can enable multiple backup modules for your providers, we will use the filesystem module in this tutorial to create backups in the local filesystem.

Enabling the filesystem backup module allows us to back up the Weaviate to the local filesystem instead of a remote backend such as S3, Google Bucket, etc. This is useful during development as it is a quick and easy setup for simplistic backups.

However, consider using other modules such as cloud-based backup features if you are in production.

To allow backups in the local filesystem in the Weaviate cluster, we need to use the backup-filesystem to the ENABLE_MODULES environment variable. This environment variable is responsible to determine the enabled modules in Weaviate.

Ensure the environment variable as follows:

ENABLE_MODULES=backup-filesystem,text2vec-transformers

Once enabled, we can configure the path in the filesystem where the backups are stored.

BACKUP_FILESYSTEM_PATH=/opt/weaviate/backups

This required parameter defines where all the Weaviate backups are copied or retrieved from during restoration.

Create a Backup in Weaviate

Once you configured the parameters for the Weaviate backups on the filesystem, you can initiate a backup operation.

The most common method to initialize a new backup process is using the API Endpoints. The method and API endpoint are shown in the following:

POST /v1/backups/{backend}

URL Parameter

This requires you to specify the target backup backend. Weaviate supports the backup backends such as Amazon S3, Google Bucket, Azure Storage, and Filesystem.

Note: Ensure to provide the name of the backup provider without the prefix. For example: s3, gcs, or filesystem.

Request Body Parameters

In the request body, the request supports the following parameters which determine the backup operation:

  1. Id – This provides the ID of the backup as a string. This string is useful as you need it for future requests such as backup restoration, status checking, etc.
  2. Include – This is a list of class names to be included in the backup. By default, Weaviate includes all the classes in the target schema.
  3. Exclude – This defines a list of class names to be excluded in the backup.

Initiate a Backup in Weaviate Using cURL

The following example command shows how to use cURL and the Weaviate API endpoint to create a backup in the filesystem:

curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"id": "backup-1",
}'
\
http://localhost:8080/v1/backups/filesystem

The previous code should create a backup to the filesystem called “backup-1”.

Including Specific Classes

We can also backup the specific classes instead of the entire schema as demonstrated in the following example request:

curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"id": " backup-1",
"include": ["Books", "Person"]
}'
\
http://localhost:8080/v1/backups/filesystem

In this case, we create a backup in Weaviate that only includes the “Books” and “Person” classes in the Weaviate schema.

Initiate a Backup in Weaviate Using Python

The second method that we can use to create a backup is the Weaviate Python Client. We can run the code as follows:

import weaviate

client = weaviate.Client('http://localhost:8080')

result = client.backup.create(

backup_id="backup-1",

backend="filesystem",

include_classes=["Books", "Person"],

wait_for_completion=True,

)

print(result)

The previous code tells Weaviate to back up the “Books” and the “Person” classes to the filesystem.

We also ensure that Weaviate waits until the backup process is complete. Keep in mind that this puts Weaviate in an unusable state until the backup is complete. Avoid this option for large or automated backups.

Get a Backup Status in Weaviate

To get the status of a backup creation, you can use the get_create_status() method as shown in the following example code:

result = client.backup.get_create_status(

backup_id="backup-1",

backend="filesystem",

)

print(result)

This should return the status of the backup creation.

Conclusion

This tutorial taught us how to configure the filesystem backups in Weaviate using the environment variables. We also learned how to use the Weaviate API endpoints and the Python client to create backups of all and specific classes.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list