Home Insights High-performance auto parts search: Moving from Endeca to Elasticsearch

High-performance auto parts search: Moving from Endeca to Elasticsearch

Eugene Steinberg

Ivan Mamontov

Jun 23, 2020 • 7 min read

Moving from Endeca to Elasticsearch: High-performance auto parts search

Table of Contents

Modeling auto parts catalog
Implementing auto-parts search with Elasticsearch
Conclusion

According to Hedges & Co. report, US online sales of auto parts and accessories amounted to $12B and, while new car sales projection are looking bleak, the future of the automotive aftermarket is quite positive. Both specialized auto parts retailers and online commerce powerhouses like Amazon and eBay are competing for the share of this market.

*Amazon is a vivid retailer of auto parts*

When you are looking for auto parts online, it is crucially important to ensure that part you are ordering will fit your vehicle. That is why all online retailers selling auto-parts first try to establish what vehicle you have in mind when searching for auto-part.

*Vehicle selection on online auto parts retailer website*

Vehicle information is used to filter parts catalog and surface only the parts which will fit the target vehicle. This kind of filtering should work seamlessly with both keyword search and category browse.

Turns out, it is pretty hard to achieve high quality search with legacy search platforms like Oracle Endeca. Many specialized retailers struggle with performance and scalability issues which are preventing them from rolling out new high-value features to successfully compete with tech giants like Amazon on auto parts market.

In this blog post, we will describe how to overcome those issues with modern open source search platforms.

Modeling auto parts catalog

Auto parts catalogs are unique because they significantly depend on part fitment information. Fitment, also referred to as compatibility orapplication, refers to the vehicle year, make, model, and engine that your parts and accessories fit. Often this information expands to trim/submodel, motor, and so on.

Products in auto parts catalog may have any number of fitments, in the range from zero, which means that product fits any vehicle, to hundreds of thousands for some widely applicable products. Typically, products have about a hundred of fitments.

*About 50% of products have no fitments, while some products have over 10,000 fitments*

Essentially, we are dealing with parent-child relationship here, where product (parent) has multiple children (fitness) entities. Our core use case is to filter products by attributes of their fitnesses, e.g. by a full or partial specification of the vehicle.

Parent-child relationship requires special modeling in search engines due to the nature of the very data structure which gives them their speed – inverted index. Core abstraction for search engine is a document which is a collection of fields, where each field can have multiple values. Each document is completely independent and doesn’t “know” about any other document in the index.

So, in order to support parent-child relationship with inverted index, we have to apply special techniques. Performing join and filtering in query time is technically possible, yet it is a known performance killer. So, most search implementations go for variety of tricks to move join work to the index time.

Endeca goes for all-in denormalization, which leads to the structure of index which looks like this:

*Denormalization approach to indexing parent-child relationships*

With denormalization, or attribute roll-down approach, parent attributes are copied to child documents and only child documents are indexed in the search engine. At the query time, child documents are searched, retrieved and grouped by parentId to return only parent documents. Here is how it works in auto parts search use case:

This is a straightforward approach and is widely used in many search applications. However, it has important drawbacks which significantly affect search performance:

Deep parent-child relationships, where majority of attributes are located on the parent level, cause massive data duplication. Problem only proliferates with additional levels of document nesting.
Many duplicate values on the children document defeats the key performance advantage of inverted index – ability to skip a whole blocks of documents, jumping to the next document containing the unique value you are looking for. Search engine index sweep has to stop frequently on every matching repeating values with limited opportunities to jump far ahead in the index.
Group by id operation at query time is quite expensive and can often become a bottleneck

Those issues may lead to response times in several hundreds of milliseconds or even seconds and strongly impact customer journey, creating perception of sluggish search experience. Customer patience is short, and options are plenty, so we should strive for immediate response from the system.

Is there a better way to model a parent-child relationships? Sure there is! One of the neat techniques available in Apache Lucene is index-time join (a.k.a. block-join) of parents and children, which essentially encodes those relationships as the order of documents in the index. This functionality is nicely exposed by both Solr and Elasticsearch search engines. Here is how it works:

Parents and children are arranged in the blocks where child documents are followed by their immediate parents. This means that in order to find a parent document by attribute of the child, all you need to do is to find a matching child and jump ahead in the index to the closest parent, which is a very natural and fast operation for inverted index. Similar approach works for finding children by their parents.

This approach has multiple advantages over all-in denormalization:

High performance search for parents by children and children by parents
No data duplication whatsoever
Nested document structure can support arbitrary parent-child relationship depth with modest impact on performance

Here is how the same search for “filter for ford mustang” will work in Lucene-based block-join index

Of course, as it usually happens, there is no free lunch, and this data modeling approach has some drawbacks as well:

Update requires reindexing of the whole document’s block
Whole document block should be collocated within the index shard

In majority of applications, this trade-off is perfectly acceptable.

One additional important consideration is the scalability of search solution. With large catalog, it is increasingly important to be able to parallelize query processing over multiple nodes to be able to keep response time under control. This is achieved with index sharding and map-reduce query processing. This is another key scalability feature not available in Endeca.

Implementing auto-parts search with Elasticsearch

Original Endeca implementation had typical response time of 600 ms, with frequent multi-second “spikes” for high-recall queries. Let’s see if we can do better than that with a modern search engine.

We implemented a sample auto parts search application based on Elasticsearch. Please note that a very similar system can be built with SolrCloud, as both systems rely on core functionality available in the Apache Lucene search library.

We used a catalog of auto-parts with about a 1M parts. We used just a single cloud node and deployed a 4-shard Elasticsearch cluster.

We emulated the Endeca’s denormalization indexing approach and compared it with block-join (nested) index structure with respect to both index size, indexing time and query response time.

As you can see, nested index is 4x more compact compared to flat index due to absence of data duplication. Indexing time is also 3x better for nested index.

Lets look at the query time statistics:

*Query time performance and scalability vs number of shards*

As of query time, we can see that nested query shows consistent high performance, while flat index queries are 3-5x time slower. even though they can be brute-forced with additional shard parallelism, at the expense of overall search cluster capacity.

If we look at the performance of some selected queries, we will see a typical picture of search performance highly depending on number of matching documents.

*Latency of some selected queries with different recall*

We can see that typical response time stays below 50 ms. It becomes slower for queries which match big portion of the catalog, like “bulb” or “engine”, but still stays under 100 ms. Also, there is no direct correlation between latency and number of fitments in retrieved products.

Conclusion

In this blog post, we discussed a practical approach to solving the main auto parts search use cases with modern open source search engines. Turns out, even on the single node we can improve Endeca response time ~10x. On top of that, migration to Elasticsearch or SolrCloud unlocks horizontal scalability and allows to support catalogs of arbitrary size without response time impact.

If you are dealing with auto parts search and experiencing performance or search quality issues, we would be happy to help.

Happy searching!

Tags

AI-driven search and experiences

Artificial intelligence

Customer experience

Digital engagement

Retail

Isometric visualization of AI-powered data flows connecting enterprise product catalog systems

Article

Six reasons your product catalog needs a makeover in 2026—and how to get it right

Retail

Article Six reasons your product catalog needs a makeover in 2026—and how to get it right

Once upon a time, your enterprise product catalog was a backend concern. A necessary system of record. Something teams updated quietly while the real “experience” work happened elsewhere. Today, that separation no longer exists. Research shows that 87% of shoppers rate product data as “extremely...

Retail

Distributed computing infrastructure with interconnected blocks and data streams in red, green, and amber, representing the hybrid deep learning architecture connecting cloud-based Azure Databricks with on-premises NVIDIA DGX systems for deep learning workloads.

Article

Hybrid deep learning with Azure Databricks and on-prem NVIDIA DGX

Financial services

Article Hybrid deep learning with Azure Databricks and on-prem NVIDIA DGX

Modern enterprises increasingly rely on deep learning to power mission-critical workflows such as global demand forecasting, inventory optimization, supply chain prediction, video-based defect detection, and financial risk modeling. These workloads demonstrate rapidly increasing GPU requirements, g...

Financial services

AI demand forecasting model comparison visualization showing pixelated human figures with data blocks representing Time Series Foundation Models and predictive analytics

Article

Time-series foundation models: AI demand forecasting comparison

Manufacturing

Article Time-series foundation models: AI demand forecasting comparison

Predictive analytics is undergoing a major transformation. This AI demand forecasting model comparison reveals significant performance gaps between traditional and modern approaches. Demand forecasting has long guided decisions in retail and manufacturing, but today’s data volumes and volatility ar...

Manufacturing

Stylized shoppers and digital devices illustrating agentic payments.

Article

What the ACP vs AP2 agentic payments comparison means for you

Retail

Article What the ACP vs AP2 agentic payments comparison means for you

Agentic commerce is in the midst of a defining moment. Instead of a customer navigating a checkout flow, AI shopping agents can now autonomously purchase goods, renew subscriptions, or restock supplies, executing payments entirely on the customer’s behalf through agentic payments protocols. It’s...

Retail

Inventory management system featuring a central storefront surrounded by delivery vans, shopping carts, stacked packages, and digital screens. The scene depicts the integration of online and physical retail, logistics, and automated inventory processes, all connected within a seamless, technology-driven supply chain

Article

Beyond multichannel: The competitive edge of omnichannel order management

Retail

Article Beyond multichannel: The competitive edge of omnichannel order management

You know the feeling: you walk into a store only to find out that the product you saw online is out of stock! This is one of the most common and problematic experiences for customers who shop multichannel retail. The problem for you? Disconnected sales channels, lost income, frustrated custom...

Retail

A shopping cart surrounded by silhouetted people in a vibrant, digital marketplace with hexagonal icons floating above, representing B2B composable commerce.

Article

Composable commerce for B2B: Overkill or delivers big?

High-tech

Article Composable commerce for B2B: Overkill or delivers big?

The buzzword “composable commerce” has dominated digital strategy conversations since Gartner popularized the term in 2020. But behind the marketing hype lies a longstanding, proven practice of integrating specialized, best-of-breed technology components into a flexible and scalable ecosystem....

High-tech

Multicolor whisps of smoke on a black background

Article

Headless CMS for the AI era with Grid Dynamics, Contentstack, and Google Cloud

Cross-industry

Article Headless CMS for the AI era with Grid Dynamics, Contentstack, and Google Cloud

For many businesses, moving away from familiar but inherently unadaptable legacy suites is challenging. However, eliminating this technical debt one step at a time can bolster your confidence. The best starting point is transitioning from a monolithic CMS to a headless CMS. This shift to a modern c...

Cross-industry

High-performance auto parts search: Moving from Endeca to Elasticsearch

Modeling auto parts catalog

Implementing auto-parts search with Elasticsearch

Conclusion

Tags

You might also like

Let's talk

Thank you!

Thank you for reaching out!

Something went wrong...

CONTACTS

SECTIONS

FOLLOW US

High-performance auto parts search: Moving from Endeca to Elasticsearch

Modeling auto parts catalog

Implementing auto-parts search with Elasticsearch

Conclusion

Tags

You might also like

Subscribe to Grid Dynamics insights now

Let's talk

Thank you!

Thank you for reaching out!

Something went wrong...

Subscribe to Grid Dynamics
insights now