“Backups are a very critical feature when working with databases. In Elasticsearch, we can create backups of specific indices, data streams, global states, features, or the entire cluster by using snapshots.
However, like all databases, the state of the cluster may change over time and depend on the snapshot frequency, which can lead to a snapshot having stale data that is no longer referenced by the current snapshot.
In this post, we will discuss how to use the Elasticsearch snapshot repository API that allows you to scan the snapshot repository content and account for the current data. The API will then remove any unreferenced data.”
Let’s dive in.
NOTE: It is best to understand that the unreferenced data does not affect the repository, snapshot or cluster performance. However, it does take up disk space which can be crucial in large-scale environments.
Request Syntax
The following code shows the request syntax to query the snapshot cleanup API.
The API endpoint may require “manage” privileges on the cluster depending on security and permission configurations.
Path Parameters
The request supports the following path parameters:
- <repository> – specifies the name of the repository on which the cleanup operation is carried out. This is a required parameter.
Query Parameters
To modify the query, you can include the following query parameters:
- master_timeout – defines the duration to wait for a response from the master node. The request fails with an error if no response is received once the duration is elapsed. The default value for the master timeout duration is 30 seconds.
- timeout – specifies the wait duration for the response. Defaults to 30 seconds.
Response Body
The following properties are included in the response body:
- results – this is an object that contains statistics performed by the cleanup operation. These stats include:
a. deleted_bytes – number of bytes removed by the cleanup API.
b. deleted_blobs – number of binary large objects deleted from the repository.
Example
The following example shows how to run a cleanup operation on the snapshot repository under the name “sample_repo”.
Output
"results": {
"deleted_bytes": 100,
"deleted_blobs": 25
}
}
You can also run snapshot repository cleanup using the Kibana dashboard.
Navigate Management -> Stack Management -> Snapshot and Restore -> Repositories.
Open the target repository and select the clean repository.
After the cleanup is complete, the request should show the cleaned statistics:
Conclusion
In this tutorial, we discussed the process of performing a snapshot repository cleanup using the Elasticsearch API and Kibana dashboard. Gather the docs for more information.
Thanks for reading!!