AI

Restore a Backup in Weaviate

As database administrators, performing regular backups of your data is an essential component. Even in vector databases such as Weaviate, performing backups is necessary as it allows to recover the data in case of data loss.

In this tutorial, we will learn how to enable and run a backup operation in Weaviate using the API endpoints.

Enabling the Weaviate Backup Modules

To take backups in Weaviate, we must enable the backup provider modules. Although you can enable multiple backup modules for your providers, we will use the filesystem module in this tutorial to create backups in the local filesystem.

Enabling the filesystem backup module allows us to back up the Weaviate to the local filesystem instead of a remote backend such as S3, Google Bucket, etc. This is useful during development as it is a quick and easy setup for simplistic backups.

However, consider using other modules such as cloud-based backup features if you are in production.

To allow backups in the local filesystem in the Weaviate cluster, we need to use the backup-filesystem to the ENABLE_MODULES environment variable. This environment variable is responsible to determine the enabled modules in Weaviate.

Ensure that the environment variable is as follows:

ENABLE_MODULES=backup-filesystem,text2vec-transformers

Once enabled, we can configure the path in the filesystem where the backups are stored.

BACKUP_FILESYSTEM_PATH=/opt/weaviate/backups

This required parameter defines where all the Weaviate backups are copied or retrieved from during restoration.

Create a Backup in Weaviate

Once you configure the parameters for Weaviate backups on the filesystem, you can initiate a backup operation.

The most common method to initialize a new backup process is using the API Endpoints. The method and API endpoint are shown in the following:

POST /v1/backups/{backend}

URL Parameter

This requires you to specify the target backup backend. Weaviate supports the backup backends such as Amazon S3, Google Bucket, Azure Storage, and Filesystem.

Note: Ensure to provide the name of the backup provider without the prefix. For example: s3, gcs, or filesystem.

Request Body Parameters

In the request body, the request supports the following parameters which determine the backup operation:

  1. Id – This provides the ID of the backup as a string. This string is useful as you need it for future requests such as backup restoration, status checking, etc.
  2. Include – This is a list of class names to be included in the backup. By default, Weaviate includes all the classes in the target schema.
  3. Exclude – This defines a list of class names to be excluded in the backup.

Initiate a Backup cURL

The following example command shows how to use cURL and the Weaviate API endpoint to create a backup in the filesystem:

curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"id": "backup-1",
}'
\
http://localhost:8080/v1/backups/filesystem

The previous code should create a backup to the filesystem called “backup-1”.

Including Specific Classes

We can also backup specific classes instead of the entire schema as demonstrated in the following example request:

curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"id": " backup-1",
"include": ["Books", "Person"]
}'
\
http://localhost:8080/v1/backups/filesystem

In this case, we create a backup in Weaviate that only includes the “Books” and “Person” classes in the Weaviate schema.

Initiate a Backup Using Python

The second method that we can use to create a backup is the Weaviate Python Client. We can run the code as follows:

import weaviate

client = weaviate.Client('http://localhost:8080')

result = client.backup.create(

backup_id="backup-1",

backend="filesystem",

include_classes=["Books", "Person"],

wait_for_completion=True,

)

print(result)

The previous code tells Weaviate to back up the “Books” and the “Person” classes to the filesystem.

We also ensure that Weaviate waits until the backup process is complete. Keep in mind that this puts Weaviate in an unusable state until the backup is complete. Avoid this option for large or automated backups.

Get the Backup Status

To get the status of a backup creation, you can use the get_create_status() method as shown in the following example code:

result = client.backup.get_create_status(

backup_id="backup-1",

backend="filesystem",

)

print(result)

This should return the status of the backup creation.

Restore a Backup in Weaviate

Once you created the backup, you will come in such instances where you need to restore a specific backup. Weaviate allows you to restore any backup to any provided machine where the name and the number of the nodes between the source and target machine are identical.

Restore Using the HTTP Request

As you can guess, the simplest method of restoring a given backup is using an HTTP request in the restore API endpoint.

The request and method is as follows:

POST /v1/backups/{backend}/{backup_id}/restore

URL Parameters

The following are the required parameters for the restoration using the HTTP request:

  1. Backend – This specifies your target backup backend such as s3, gcp, filesystem.
  2. Backup_id – This specifies the ID of the backup that you wish to restore.

Request Body Parameters

The request takes a JSON object with the following properties:

  1. Include – It specifies a list of classes that you wish to include from the backup.
  2. Exclude – It specifies the list of classes that you wish to exclude in the restoration.

Note: You cannot use the include and exclude properties simultaneously. Set none or exactly one of those.

Initiate Restore Using cURL

The following command shows how to use cURL to invoke a backup restoration in Weaviate:

curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"id": "backup-1",
}'
\
http://localhost:8080/v1/backups/filesystem/backup-1/restore

The previous command should initiate a backup restoration for all the classes that are included in the backup.

Exclude Specific Classes

To exclude specific classes from the restoration, you can run the request as follows:

curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"id": "backup-1",
"exclude": ["Person"]
}'
\
http://localhost:8080/v1/backups/filesystem/backup-1/restore

We tell Weaviate to exclude the “Person” class from the restoration in this case.

Initiate Restore Using Python

We can also use the Python client to invoke a restoration process as shown in the following code:

result = client.backup.restore(

backup_id="backup-1"

backend="filesystem",

wait_for_completion=True,

)

print(result)

Similarly, this should restore the specified backup with all the supported classes.

Get the Restore Status

To check the status of a restoration process, you can run the code as follows:

result = client.backup.get_restore_status(

backup_id="my-very-first-backup",

backend="filesystem",

)

print(result)

This should return the status of the restoration process in an asynchronous manner.

Conclusion

We learned how to configure the backup operation in Weaviate, the various methods of initiating and checking the backup status, and the various methods and techniques of restoring a backup.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list