Home Insights Advanced Solr/Lucene topics: High-performance nested search for e-commerce applications

Advanced Solr/Lucene topics: High-performance nested search for e-commerce applications

Advanced Solr/Lucene topics: high-performance nested search for e-commerce applications

Solr/Lucene has emerged over the last few years as a leading open source search platform for large-scale e-commerce search engines. Systems based on Solr power major sites including Macy’s, Kohl’s, Walmart, Etsy, and many others. An increasing number of tier-1 digital retailers are building their next-generation search and catalog navigation platforms using the Solr technology stack, often replacing commercial engines such as Oracle Endeca, FAST or Mercado.

Grid Dynamics has been one of the early adopters of Solr for large-scale, complex e-commerce catalogs with millions of SKUs, providing a highly optimized, omni-channel, “Black Friday-ready” experience for some of the world’s largest retailers.

Our engineers have made numerous contributions to Solr and Lucene. They’ve spoken at technical conferences and published many technical blogs that help Solr developers. One specific area of extensive research and innovation where our team shines is dealing with nested document structures, which are very useful when modeling complex e-commerce catalogs. The Nested Search and Faceting approach solves many of the performance and scalability challenges encountered when building large scale, contextualized, omnichannel catalogs. It’s especially useful for effective and scalable processing of relationships (such as “chair is a part of furniture set”) and their attributes in a search system, optimized to deal with documents which are completely independent from one another. 

  1. We were early adopters of, and actively advocated the Block Join (also known as index-time join) approach to implementing nested search. This approach is supported in Lucene with Block Join Query (BJQ) and in Solr with the Block Join Facet component, which we contributed to the Solr/Lucene codebase.

    For a more complete introduction to this topic, please see the joint talk at Lucene Revolution 2015 given by Eugene Steinberg of Grid Dynamics and Peter Gazaryan of Macys.com. Both slides and a video are online.

    In this series of blog posts on Nested Search for E-commerce, we are republishing a collection of revised and updated blog posts written over the last 2 years. These posts are specifically targeted at the developers of the search engines, so the information is presented in rather technical form, and assumes good familiarity with Solr/Lucene framework. Specifically, these blog posts will cover the following aspects of BJQ design:

    Post 1: Introduction to Block Join Faceting
    Post 2: High-Performance Join in Solr with BlockJoinQuery
    Post 3: How to Implement Block Join Faceting in Solr/Lucene
    Post 4:  Using Block Join to Improve Search Efficiency with Nested Documents in Solr
    Post 5: The Segmented Filter Cache and Block join Query Parser in Solr
    Post 6: Searching Grandchildren and Siblings with Solr Block Join
    Post 7:  A Frustrating Personal Experience with Unfaceted Search

    If you have questions about any of the topics covered in this series of posts or more generally related to the design and tuning of search engines, please drop us a line and one of our Search Architects will follow up promptly.

Tags

You might also like

Isometric visualization of AI-powered data flows connecting enterprise product catalog systems
Article
Six reasons your product catalog needs a makeover in 2026—and how to get it right
Article Six reasons your product catalog needs a makeover in 2026—and how to get it right

Once upon a time, your enterprise product catalog was a backend concern. A necessary system of record. Something teams updated quietly while the real “experience” work happened elsewhere. Today, that separation no longer exists. Research shows that 87% of shoppers rate product data as “extremely...

Distributed computing infrastructure with interconnected blocks and data streams in red, green, and amber, representing the hybrid deep learning architecture connecting cloud-based Azure Databricks with on-premises NVIDIA DGX systems for deep learning workloads.
Article
Hybrid deep learning with Azure Databricks and on-prem NVIDIA DGX
Article Hybrid deep learning with Azure Databricks and on-prem NVIDIA DGX

Modern enterprises increasingly rely on deep learning to power mission-critical workflows such as global demand forecasting, inventory optimization, supply chain prediction, video-based defect detection, and financial risk modeling. These workloads demonstrate rapidly increasing GPU requirements, g...

AI demand forecasting model comparison visualization showing pixelated human figures with data blocks representing Time Series Foundation Models and predictive analytics
Article
Time-series foundation models: AI demand forecasting comparison
Article Time-series foundation models: AI demand forecasting comparison

Predictive analytics is undergoing a major transformation. This AI demand forecasting model comparison reveals significant performance gaps between traditional and modern approaches. Demand forecasting has long guided decisions in retail and manufacturing, but today’s data volumes and volatility ar...

Stylized shoppers and digital devices illustrating agentic payments.
Article
What the ACP vs AP2 agentic payments comparison means for you
Article What the ACP vs AP2 agentic payments comparison means for you

Agentic commerce is in the midst of a defining moment. Instead of a customer navigating a checkout flow, AI shopping agents can now autonomously purchase goods, renew subscriptions, or restock supplies, executing payments entirely on the customer’s behalf through agentic payments protocols. It’s...

Inventory management system featuring a central storefront surrounded by delivery vans, shopping carts, stacked packages, and digital screens. The scene depicts the integration of online and physical retail, logistics, and automated inventory processes, all connected within a seamless, technology-driven supply chain
Article
Beyond multichannel: The competitive edge of omnichannel order management
Article Beyond multichannel: The competitive edge of omnichannel order management

You know the feeling: you walk into a store only to find out that the product you saw online is out of stock! This is one of the most common and problematic experiences for customers who shop multichannel retail. The problem for you? Disconnected sales channels, lost income, frustrated custom...

A shopping cart surrounded by silhouetted people in a vibrant, digital marketplace with hexagonal icons floating above, representing B2B composable commerce.
Article
Composable commerce for B2B: Overkill or delivers big?
Article Composable commerce for B2B: Overkill or delivers big?

The buzzword “composable commerce” has dominated digital strategy conversations since Gartner popularized the term in 2020. But behind the marketing hype lies a longstanding, proven practice of integrating specialized, best-of-breed technology components into a flexible and scalable ecosystem....

Multicolor whisps of smoke on a black background
Article
Headless CMS for the AI era with Grid Dynamics, Contentstack, and Google Cloud
Article Headless CMS for the AI era with Grid Dynamics, Contentstack, and Google Cloud

For many businesses, moving away from familiar but inherently unadaptable legacy suites is challenging. However, eliminating this technical debt one step at a time can bolster your confidence. The best starting point is transitioning from a monolithic CMS to a headless CMS. This shift to a modern c...

Let's talk

    This field is required.
    This field is required.
    This field is required.
    By sharing my contact details, I consent to Grid Dynamics process my personal information. More details about how data is handled and how opt-out in our Terms & Conditions and Privacy Policy.
    Submitting

    Thank you!

    It is very important to be in touch with you.
    We will get back to you soon. Have a great day!

    check

    Thank you for reaching out!

    We value your time and our team will be in touch soon.

    check

    Something went wrong...

    There are possible difficulties with connection or other issues.
    Please try again after some time.

    Retry