In this tutorial, we will learn how to enable and run a backup operation in Weaviate using the API endpoints.
Enabling the Weaviate Backup Modules
To take backups in Weaviate, we must enable the backup provider modules. Although you can enable multiple backup modules for your providers, we will use the filesystem module in this tutorial to create backups in the local filesystem.
Enabling the filesystem backup module allows us to back up the Weaviate to the local filesystem instead of a remote backend such as S3, Google Bucket, etc. This is useful during development as it is a quick and easy setup for simplistic backups.
However, consider using other modules such as cloud-based backup features if you are in production.
To allow backups in the local filesystem in the Weaviate cluster, we need to use the backup-filesystem to the ENABLE_MODULES environment variable. This environment variable is responsible to determine the enabled modules in Weaviate.
Ensure the environment variable as follows:
Once enabled, we can configure the path in the filesystem where the backups are stored.
This required parameter defines where all the Weaviate backups are copied or retrieved from during restoration.
Create a Backup in Weaviate
Once you configured the parameters for the Weaviate backups on the filesystem, you can initiate a backup operation.
The most common method to initialize a new backup process is using the API Endpoints. The method and API endpoint are shown in the following:
URL Parameter
This requires you to specify the target backup backend. Weaviate supports the backup backends such as Amazon S3, Google Bucket, Azure Storage, and Filesystem.
Note: Ensure to provide the name of the backup provider without the prefix. For example: s3, gcs, or filesystem.
Request Body Parameters
In the request body, the request supports the following parameters which determine the backup operation:
- Id – This provides the ID of the backup as a string. This string is useful as you need it for future requests such as backup restoration, status checking, etc.
- Include – This is a list of class names to be included in the backup. By default, Weaviate includes all the classes in the target schema.
- Exclude – This defines a list of class names to be excluded in the backup.
Initiate a Backup in Weaviate Using cURL
The following example command shows how to use cURL and the Weaviate API endpoint to create a backup in the filesystem:
-X POST \
-H "Content-Type: application/json" \
-d '{
"id": "backup-1",
}' \
http://localhost:8080/v1/backups/filesystem
The previous code should create a backup to the filesystem called “backup-1”.
Including Specific Classes
We can also backup the specific classes instead of the entire schema as demonstrated in the following example request:
-X POST \
-H "Content-Type: application/json" \
-d '{
"id": " backup-1",
"include": ["Books", "Person"]
}' \
http://localhost:8080/v1/backups/filesystem
In this case, we create a backup in Weaviate that only includes the “Books” and “Person” classes in the Weaviate schema.
Initiate a Backup in Weaviate Using Python
The second method that we can use to create a backup is the Weaviate Python Client. We can run the code as follows:
client = weaviate.Client('http://localhost:8080')
result = client.backup.create(
backup_id="backup-1",
backend="filesystem",
include_classes=["Books", "Person"],
wait_for_completion=True,
)
print(result)
The previous code tells Weaviate to back up the “Books” and the “Person” classes to the filesystem.
We also ensure that Weaviate waits until the backup process is complete. Keep in mind that this puts Weaviate in an unusable state until the backup is complete. Avoid this option for large or automated backups.
Get a Backup Status in Weaviate
To get the status of a backup creation, you can use the get_create_status() method as shown in the following example code:
backup_id="backup-1",
backend="filesystem",
)
print(result)
This should return the status of the backup creation.
Conclusion
This tutorial taught us how to configure the filesystem backups in Weaviate using the environment variables. We also learned how to use the Weaviate API endpoints and the Python client to create backups of all and specific classes.