Re-Index Billions of Documents in Elastic-Search from different Cluster:

saurav omar
2 min readAug 6, 2020
  • We can use the reindex API to migrate indices data from one cluster to another cluster.
  • Elasticsearch provides backward compatibility support that enables indices from the previous major version to be upgraded to the current major version

Whitelist old cluster Node:

  • Setup a new cluster.
  • Added below property in one of the co-ordinators node.
reindex.remote.whitelist: old_cluster_node:9200
  • Above property whitelist old_cluster_node IP to reindex data from old cluster to new.
  • Set above property in elasticsearch.yml.
  • And restart the node where you have added above property.
  • This property does not need to be added to all nodes of the cluster, only node where you are going to hit the Reindex API.

Re-Index Data:

  • Create an index of the appropriate mappings and settings.
  • Set the refresh_interval to -1 and set number_of_replicas to 0 for faster reindexing.
  • Refresh Interval stops data to index and the number of replicas will stop replication and shard allocations which take huge time.
  • We can see the from the below example
curl --location --request PUT '${NEW_CLUSTER_NODE_IP}/$I{INDEX_NAME}/_settings' \--header 'Content-Type: application/json' \--data-raw '{               {                   "index" : {                     "refresh_interval" : -1,                      "number_of_replicas" : 0                   }               }       }'
  • Now we use reindex API to start re-indexing
  • If you run the reindex job in the background by setting wait_for_completion to false, the reindex request returns a task_id you can use to monitor the progress of the reindex job with the task API: GET _tasks/TASK_ID.
curl --location --request PUT '${NEW_CLUSTER_NODE_IP}/_reindex?wait_for_completion=false' \--header 'Content-Type: application/json' \--data-raw '{          "source": {             "remote": {                   "host": "${PRIMARY_SERVER_URL}",                   "socket_timeout": "5m",                   "connect_timeout": "5m"               },           "index": "${OLD_INDEX_NAME}"        },         "dest": {              "index": "${NEW_INDEX_NAME}"       }   }'

Parameter Explanations:

host: The REST endpoint of the remote cluster.

socket_timeout: Time UnitThe wait time for socket reads (default 30s).

connect_timeout: Time UnitThe wait time for remote connection timeouts (default 30s)

  • After reindexing completed:

Set the refresh_interval to 10s and set number_of_replicas to 2

curl --location --request PUT '${NEW_CLUSTER_NODE_IP}/$I{INDEX_NAME}/_settings' \--header 'Content-Type: application/json' \--data-raw '{{         "index" : {           "refresh_interval" : 10s,           "number_of_replicas" : 3        }      }}'

Happy Learning!!

Thanks

--

--