Select
Open source search Visual search Search performance engineering Smart autosuggest Endeca replatforming

When improving digital search engines, firms need to invest equally in product discovery features and search performance efforts so that search speeds are fast and costs are kept under control. Whenever firms add new product discovery features, there are new types of data, business signals and sophisticated queries for search engines to handle. Eventually, these changes can negatively affect response times, indexing duration and overall search data staleness. These accumulated problems will slow search, upset customers and increase search infrastructure costs. Therefore, companies need to control new features and ensure that they are properly supported by search performance engineering. 

Grid Dynamics started in 2006 with the goal of making mission-critical systems scalable. Since then, performance and scalability engineering have been at the core of our company's DNA. As Solr and Lucene contributors, we have a deep understanding of search engine internals, and possess years of experience tuning the performance of search systems. For over 5 years, the search solutions that we have built for our retail customers have survived Black Friday traffic storms with flying colors, without a single outage or breach of SLAs. We have helped numerous customers solve their search performance issues. We improved response times, online conversion rates from search, and indexing speeds.

experts-in-search-engine-01.svg

Taking a top-down approach

When it comes to performance engineering, we prefer to use a top-down methodology by starting our analysis with high level system metrics, then drilling down to understand the true root cause of the issues.

We ask ourselves three guiding questions: 

  1. What computational resources are bottlenecking and preventing applications from going faster? We use deep system-level monitoring to find the true culprit. 
  2. Where exactly does the bottleneck happen? We use sampling and profiling to identify hotspots and resource congestions down to the individual subsystems of the search engine. 
  3. How can the bottleneck be removed? This is where our deep understanding of the internal workings of the search engine helps. It is often possible to slightly change the index structure, rewrite a query or tune caches to improve performance dramatically. 

Search performance engineering methodology overview

undefined

Search index structure

Search index structure

We define effective index structure for entities and entity relationships. This includes document nesting, embedding, roll-ups and roll-downs, as well as other special indexing tricks. We also optimize field mappings and indexing options for performance metrics.
Indexing performance

Indexing performance

We implement high throughput bulk indexing with document streaming. The indexing workload is optimized with change buffering, deduplication and coalescing, as well as with fast partial document updates. We also tune the index refresh rates and the index segments merge policy.
Query performance

Query performance

We analyze query patterns in general, and perform deep profiling on slow query patterns. This helps identify bottlenecks and hotspots, and where to rewrite queries, as well as scoring and boosting optimizations. Additionally, we optimize faceting and aggregations, and fine tune all search engine caches.
Production environment tuning

Production environment tuning

Some features involve capacity planning and cluster topology optimization, which includes sharding, shard allocation, replicas, node roles, cluster state, discovery and fault detection. There are also tuning of thread pools, GC tuning and other JVM settings, as well as OS tuning for RAM buffers, index memory mapping and swappiness.
Core search library
Most popular open source search toolkit
Popular open source search engine
Flamescope visualizes where your application spends its time with flame graphs
JVM Mission Control allows for deep introspection into JVM metrics at runtime
GC Viewer helps with visualization and deep analysis of JVM GC logs
Luke is a tool for in-depth introspection of Lucene index structure
Yourkit is a general purpose JVM profiler for drilling down into most tricky performance bottlenecks
Core search library
Most popular open source search toolkit
Popular open source search engine
Flamescope visualizes where your application spends its time with flame graphs
JVM Mission Control allows for deep introspection into JVM metrics at runtime
GC Viewer helps with visualization and deep analysis of JVM GC logs
Luke is a tool for in-depth introspection of Lucene index structure
Yourkit is a general purpose JVM profiler for drilling down into most tricky performance bottlenecks

Schedule a free workshop with one of our senior architects to learn more

This field is requiredPlease enter your name
This field is requiredPlease enter your email
This field is requiredPlease enter company name