Elastic Search Backup and Restore (SnapShot and Restore)

saurav omar
3 min readJun 6, 2020

As a cluster and your indices grow, data backup or archival of data becomes a necessity. Let’s say if suddenly some of nodes went down from the cluster, then the backup is the way to bring the back the cluster.

Elasticsearch has a snapshot and the restore module which will help to backup and restore in the cluster.

What is Snapshots?

  • Snapshots are not instantaneous, take time to complete and do not represent perfect point-in-time views of the cluster.
  • When Snapshot is in-progress, you can still index documents and make other requests to the cluster, but new documents (and updates and deletes to existing documents) are not included.
  • The snapshots include only primary shards.
  • Incremental in nature, only store data that has changed since the last successful snapshot.

Register repository

Elasticsearch was designed to be run in different environments, and it works extremely well in a cloud environment. The snapshot/restore module also supports various cloud repositories such as:

  • Amazon S3
  • Hadoop Distributed File System (HDFS)
  • Azure Storage,

So before taking snapshot we have to register a snapshot repository. It is just a storage location where we wanted to store our snapshot.

For File System:curl -XPUT localhost:9200/_snapshot/${SNAPSHOT_NAME} -d '{
"type": "fs", (3)
"settings": {
"location": "my/snapshot/directory", (4)
"compress": true, (5)
"chunk_size": "10m" (6)
}
}'
For S3:curl -XPUT localhost:9200/_snapshot/${SNAPSHOT_NAME} -d '{
"type": "s3",
"settings": {
"bucket": "my-s3-bucket",
"base_path": "my/snapshot/directory"
}
}

Take snapshots or backup:

You specify two pieces of information when you create a snapshot:

  • Name of your snapshot repository
  • Name for the snapshot
curl -XPUT localhost:9200/_snapshot/my-repository
{
"indices": "index-1*,index-2,...",
"ignore_unavailable": true,
"include_global_state": false,
"partial": false
}

Description:

indices: The indices that you want to include in the snapshot. You can use , to create a list of indices, * to specify an index pattern, and - to exclude certain indices. Default is all indices.

ignore_unavailable: If an index from the indices list doesn’t exist, whether to ignore it rather than fail the snapshot. The default is false.

include_global_state: Whether to include cluster state in the snapshot. The default is true.

partial: Whether to allow partial snapshots. Default is false, which fails the entire snapshot if one or more shards fails to store.

Restore snapshots:

The first step in restoring a snapshot is retrieving existing snapshots. To see all snapshot repositories if you are restoring in the same cluster the.

curl -XGET localhost:9200/_snapshot/_all

Note: If you restoring in the different cluster then again Register repository check above.

  • Now above command will display all the snapshots, take the latest one, and validate.
curl -XGET localhost:9200/_snapshot/my-repository/_all

Note: If you restoring in the different cluster then you need to check in the location in S3 or file system but you cannot validate that snapshot was partial or full.

  • In case any failure present then below JSON will be populated that means we have not full backup. Simultaneously check for others snapshots.
“failures”: [
{
“index”: “${Index name}”,
“index_uuid”: “${Index UUId}”,
“shard_id”: ,
“reason”: “node shutdown/Repository exception”,
“node_id”: “node_123”,
“status”: “INTERNAL_SERVER_ERROR”
}
],
“shards”: {
“total”: 100,
“failed”: 1,
“successful”: 99
}
  • Now restore
curl -XPOST localhost:9200/_snapshot/my-repository/_restore
{
"indices": "index-1*,index-2,..",
"ignore_unavailable": true,
"include_global_state": false,
"include_aliases": false,
"partial": false,
"rename_pattern": "index_1(.+)",
"rename_replacement": "new-index",
"index_settings": {
"index.blocks.read_only": false
},
"ignore_index_settings": [
"index.refresh_interval"
]
}

Description:

indices: The indices that you want to restore. same as while taking snapshot.

indices.ignore_unavailable: Same as Snapshot. Default is false.

include_global_states: Same as Snapshot. Default is false.

include_aliases: Whether to restore aliases alongside their associated indices. The default is true.

partial: Same as Snapshot. Default is false.

rename_pattern: If you want to rename indices as you restore them, use this option to specify a regular expression that matches all indices you want to restore. Use capture groups (()) to reuse portions of the index name.

rename_replacement: If you want to rename indices as you restore them, use this option to specify the replacement pattern. Use $0 to include the entire matching index name, $1 to include the content of the first capture group, etc.

index_settings: If you want to change index settings on restore, specify them here.

ignore_index_settings: Rather than explicitly specifying new settings with index_settings, you can ignore certain index settings in the snapshot and use the cluster defaults on restore.

That’s it!!

Happy Learning.

--

--