Elastic Search

Elasticsearch Show Disk Space Usage

“Like any other data storage system or database, when working with Elasticsearch, you will come across an instance where you need to determine the disk usage for your cluster or index. This can help you plan your cluster arrangement and nodes.”

In this tutorial, you will learn various methods and techniques for determining the disk usage for your cluster or Elasticsearch index.

Let’s dive in.

Method 1 – Per Shard Disk Stats

Using the cat shards API, you can view the disk usage for each shard in the cluster. In addition, the API should return detailed information about the shards, including information such as the node, number of documents, disk usage, etc.

We can use this API to show disk usage per shard, as shown in the query below.

curl -XGET "http://localhost:9200/_cat/shards?human=true" -H "kbn-xsrf: reporting"

The request above should return information per shard basis. You will find disk usage for each shard in the store column.

An example output is as shown:

The output above should disk usage for each size in a human-readable format.

Method 2 – Disk Usage for Node Basis

We can also retrieve disk usage information on a node basis using the cat allocations API. An example command is as shown:

curl -XGET "http://localhost:9200/_cat/allocation?human=true" -H "kbn-xsrf: reporting"

The command should return, such as the number of shards in each node, disk used, disk available, and disk total. Using the human parameter produces the disk usage in a human-readable format.

An example output:

You can also use nodes statistics API. An example command is as shown:

curl -XGET "http://localhost:9200/_nodes/stats/fs?human=true" -H "kbn-xsrf: reporting"

The command returns the node information, including disk usage, as shown:

Method 3 – Disk Usage Information in Index (Experimental)

As of writing this tutorial, Elasticsearch has an experimental disk usage API. You can use this API to get the disk usage information of a specific index.

The syntax is as shown:

POST /<index_name>/_disk_usage?run_expensive_tasks=true

The query above requires the run_expensive_task parameter to be true. This is because the disk usage API is regarded as a resource-intensive operation.

Otherwise, you will get an error as:

curl -XPOST "http://localhost:9200/earthquake/_disk_usage?human=true" -H "kbn-xsrf: reporting"

For example, we can get the disk usage information of an index called earthquake:

curl -XPOST "http://localhost:9200/earthquake/_disk_usage?run_expensive_tasks=true&human=true" -H "kbn-xsrf: reporting"

The disk usage information is as shown:

The query will return the disk usage of the specified index. Note that the command will also return each field and its corresponding size.

Closing

In this tutorial, you learned various methods and techniques for fetching disk usage information in the Elasticsearch cluster.

Thanks for reading!!

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list