The #1 method KELSIEM uses to maximise Elasticsearch performance


A common misunderstanding is that Elasticsearch works like any other No SQL database. This couldn't be further from the truth, because one of reasons it performs so well is that how it manages updates and deletions.  Understanding the pros and cons will help you to tune Elasticsearch for your purposes.

We have to go beyond Elasticsearch and look at Lucene (which ES is built on).

When an Elasticsearch document is updated using the UPDATE API or re-indexed by using an INSERT with the same doc ID, what happens internally in Elasticsearch is that it actually marks it as deleted and creates a new document with the same doc ID (REF [1]).

Elasticsearch will then only delete the document when Lucene segments are merged. (Note: ES Shard == Lucene Index). Lucene indexes are composed of multiple sub-indexes, or segments. Each segment is a fully independent index, which could be searched separately. The only way these change are by:
1. Creating new segments for newly added documents.
2. Merging existing segments.

An ES Index search always involves multiple ES Shards, each of them having multiple segments. The goal is to always reduce the number of segments by performing segment merging, otherwise searches become queued and response times become longer.

Segment merging occurs automatically, but most cases, there is a strong possibility that the index changes occur faster than the segment merging can keep up. Segment merging is important because it each segment is separate sub-index. Therefore it's easy to understand why searches will be faster when segments are merged. I've done years of testing on this to maximise search performance.

(Aside: Segment merging doesn't actually delete the document, it simply takes all the non-deleted documents from one segment and copies then to another segment, before deleting the originating segment). REF [2]

Internally, Elasticsearch has marked the old document as deleted and added an entirely new document. The old version of the document doesn’t disappear immediately, although you won’t be able to access it. Elasticsearch cleans up deleted documents in the background as you continue to index more data.

Lucene In Action, Michael McCandless, Erik Hatcher, and Otis Gospodnetić, 2010

Lucene Segments


Zak Siddiqui

Written by Zak Siddiqui

Zak Siddiqui is the Founder at KELSIEM based in Sydney, Australia. He spends his time helping companies define and achieve their security goals using next-generation technologies. Unsatisfied by existing SIEM products, Zak embarked on a project to come up with something better, faster, and cheaper. As Co-Founder and Chief Software Architect of KELSIEM, he helped build and launch KELSIEM REALTIME SECURITY, a managed cloud SIEM service. Zak enjoys tinkering and exploring new technologies to embrace the future, break existing paradigms, and sharing his journey with others.