How to handle pagination of search results in Elasticsearch using Python?

Published on Aug. 22, 2023, 12:19 p.m.

To handle pagination of search results in Elasticsearch using Python, you can use the from and size parameters in the search query. Here’s an example code snippet:

from elasticsearch import Elasticsearch

# create an Elasticsearch client instance
es = Elasticsearch(['http://localhost:9200'])

# define the search query with pagination
query = {
    "query": {
        "match": {
            "description": "python"
        }
    },
    "from": 0,                  # start with the first hit
    "size": 10                  # return up to 10 hits per page
}

# perform the search and get the results
results = es.search(index="my_index", body=query)

# print the results
for result in results['hits']['hits']:
    print(result['_source'])

# get the total number of hits
total_hits = results['hits']['total']['value']

# paginate through the remaining results
while len(results['hits']['hits']) > 0:
    query['from'] += query['size']    # move to the next page
    results = es.search(index="my_index", body=query)
    for result in results['hits']['hits']:
        print(result['_source'])

# Note: This is a basic example of how to handle pagination using the "from" and "size" parameters. Depending on your specific use case, you may need to handle pagination differently, such as using the Scroll API for large result sets.

In this example, we’re searching the my_index index for documents containing the word “python” in their “description” field, and retrieving the results in batches of up to 10 hits per page. We start by retrieving the first page of hits using from=0 and size=10, and then use a while loop to iterate through the remaining pages of hits.

Note that the total number of hits is obtained from the total field returned by the initial search request (results['hits']['total']['value']). Depending on the size of the result set, you may need to handle pagination differently, such as using the Scroll API for large result sets.