Elastic Search

How Do You Scroll in Elasticsearch?

In Elasticsearch, a search query can be as simple as a single document or large and complex results consisting of millions of records.

This concise guide will teach you to scroll through the documents returned from a search query using the scroll API.

It is good to note that scrolling thru’ documents using the scroll API is not recommended for real-time requests. It is mainly helpful for processing extensive collections of documents.

Basic Usage

In this example, we will use the kibana_sample_data_flights index. You can find the sampled data on Kibana get started page.

Suppose we want to get the number of flights where the ticket price was greater than 500 and less than 1000, we can perform a query as:

GET /kibana_sample_data_flights/_search
{
 "query": {
   "range": {
     "A": {
       "gte": 500,
       "lte": 1000,
       "boost": 2
     }
   }
 }
}

Once we run the above request, we should get all the documents within the specified range of the ticket price.

Below is an example output:

As you can see from the above output, we get over 7800 results in a single query.

Let us say we only want to view one record at a time instead of the entire 7844. We can do this by using the from and size parameters as shown in the query below:

GET /kibana_sample_data_flights/_search
{
 "from": 0,
 "size": 1,
 "query": {
   "range": {
     "AvgTicketPrice": {
       "gte": 500,
       "lte": 1000,
       "boost": 2
     }
   }
 }
}

In the above example, we use the from parameter that defines what index we should start fetching the records. Since indexing in Kibana begins at 0, we set it as the initial index value.

The size parameter sets the maximum number of records to show per page.

An example of the results is below:

As you can see from the output above, we only get one document out of a total of 7844.

To scroll to the next document, we start from 1 instead of 0. As:

GET /kibana_sample_data_flights/_search
{
 "from": 1,
 "size": 1,
 "query": {
   "range": {
     "AvgTicketPrice": {
       "gte": 500,
       "lte": 1000,
       "boost": 2
     }
   }
 }
}

This will retrieve the following document from the search result.

When using the from and size parameters, Elasticsearch will limit you to only 10,000 documents.

The Scroll API

The scroll API comes in handy at this point. We can use it to retrieve an extensive collection of documents from a single request.

The scroll API requires a scroll_id that you can get by specifying the scroll argument in the query request.

The scroll argument must specify how long the search context stays alive.

Let us see how to use it in an example.

The first step is to fetch the scroll_id, which we can do by passing the scroll parameter followed by the duration of the search context.

POST /kibana_sample_data_flights/_search?scroll=10m
{
 "size": 100,
 "query": {
   "range": {
     "AvgTicketPrice": {
       "gte": 500,
       "lte": 1000,
       "boost": 2
     }
   }
 }
}

In the example request above, we set the scroll parameter with a search context of 10 minutes. We then specify the number of records to retrieve per page and the query to match.

The response from the request above should include a scroll_id which we can use with Scroll API and the first 100 documents matching the specified query.

To get the next batch of 100 records, we use the scroll API, including the scroll id from the above response.

GET /_search/scroll
{
 "scroll": "10m",
 "scroll_id": "FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFko5WGQ3VTBOUzVlW"
}

In the request above, we specify that we want to use the scroll API followed by the search context. This tells Elasticsearch to refresh the search context and keep it alive for 10 minutes.

Next, we pass the scroll_id we get from the previous request and retrieve the subsequent 100 documents.

Final Thoughts

The scroll API comes in handy when you need to retrieve documents more than 10,000. Despite its functionality, the scroll API has some drawbacks addressed by other pagination methods such as search_after.

Consider our tutorial on Elasticsearch pagination to learn more.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list