Blog | Kelsiem

The #1 method KELSIEM uses to maximise Elasticsearch performance

Written by Zak Siddiqui | Jan 21, 2019 1:03:33 AM

INTRODUCTION:

A common misunderstanding is that Elasticsearch works like any other No SQL database. This couldn't be further from the truth, because one of reasons it performs so well is that how it manages updates and deletions.  Understanding the pros and cons will help you to tune Elasticsearch for your purposes.

We have to go beyond Elasticsearch and look at Lucene (which ES is built on).

When an Elasticsearch document is updated using the UPDATE API or re-indexed by using an INSERT with the same doc ID, what happens internally in Elasticsearch is that it actually marks it as deleted and creates a new document with the same doc ID (REF [1]).

Elasticsearch will then only delete the document when Lucene segments are merged. (Note: ES Shard == Lucene Index). Lucene indexes are composed of multiple sub-indexes, or segments. Each segment is a fully independent index, which could be searched separately. The only way these change are by:
1. Creating new segments for newly added documents.
2. Merging existing segments.

An ES Index search always involves multiple ES Shards, each of them having multiple segments. The goal is to always reduce the number of segments by performing segment merging, otherwise searches become queued and response times become longer.

Segment merging occurs automatically, but most cases, there is a strong possibility that the index changes occur faster than the segment merging can keep up. Segment merging is important because it each segment is separate sub-index. Therefore it's easy to understand why searches will be faster when segments are merged. I've done years of testing on this to maximise search performance.

(Aside: Segment merging doesn't actually delete the document, it simply takes all the non-deleted documents from one segment and copies then to another segment, before deleting the originating segment). REF [2]

References
[1] https://www.elastic.co/guide/en/elasticsearch/guide/current/update-doc.html
Internally, Elasticsearch has marked the old document as deleted and added an entirely new document. The old version of the document doesn’t disappear immediately, although you won’t be able to access it. Elasticsearch cleans up deleted documents in the background as you continue to index more data.

[2] https://www.manning.com/books/lucene-in-action-second-edition
Lucene In Action, Michael McCandless, Erik Hatcher, and Otis Gospodnetić, 2010

[3] https://lucene.apache.org/core/2_9_4/fileformats.html#Segments
Lucene Segments