Home Insights How to use block join to improve search efficiency with nested documents in Solr

How to use block join to improve search efficiency with nested documents in Solr

Mikhail Khludnev

Aug 10, 2016 • 3 min read

How to use Block Join to improve search efficiency with nested documents in Solr

Table of Contents

Indexing
Searching
Caveat
Further directions

Faster responses make customers happy. Lower hardware requirements make budget people happy. Block join can help accomplish both these goals, which is why we strongly suggest using it for nested document searches in Solr. But that’s enough about why we advocate using block join for nested and faceted searches in Solr. Now we’ll talk about how to do it.

Indexing

SolrInputDocument has methods — getChildDocuments()and addChildDocument() — for nesting child documents into a parent document. XML and Javabin formats are now able to transfer them. JSON support is ongoing.

Start by indexing a few t-shirts, as a sample product-SKU hierarchy using post.jar.

To check how blocks is laid out, run a match-all query with csv output. You will see that the parent document is placed right after its children.

It is necessary to be aware of the implicit _root_ field which works as a block identifier; all child documents obtain _root_ value from the parent’s uniqueKey field. This is used for overwriting whole blocks on update.

Searching

Let’s assume we have a query matching our Red-XL child documents (SKUs AKA UPCs):

q=+COLOR_s:Red +SIZE_s:XL. It returns children with IDs 11 and 31.

Now let’s join from children to parent by calling a special “parent” query parser:

q={!parent which=’type_s:parent’}+COLOR_s:Red +SIZE_s:XL that returns parents 10 and 30, as expected.

The local parameter “which” provides a filter that distinguishes parent documents from child. Keep in mind two important things about it:

It should not match any child documents
It should always match all parent documents

Make sure block join avoids the cross-match problem; that it doesn’t capture parent 20, which is a candidate for a potential false positive match as it has Red and XL SKU’s, but doesn’t have an SKU that is both Red and XL.

This {!parent} query can be combined with any other query and filter. For example, we can constrain results by brand:

q=+BRAND_s:Nike +_query_:"{!parent which=type_s:parent}+COLOR_s:Red +SIZE_s:XL"

The same can be achieved by employing a filter query:

q={!parent which=type_s:parent}+COLOR_s:Red
 +SIZE_s:XL&fq=BRAND_s:Puma

Don’t try to constrain children by filter queries; it doesn’t work because filter queries explicitly constrain the {!parent} query.

There is a “reverse” query parser for searching child documents by parent filter:

{!child of=type_s:parent}BRAND_s:Puma returns SKUs that belong to the single Puma product.

Note that even as the local parameter name changes, it keeps the same meaning by supplying a parent filter.

If you are not familiar with nested queries and local parameters check this short intro.

Last but not least: it works for distributed search, too.

Caveat

You always need to be quite accurate with updating blocks. They always need to be updated as whole. To show an unlucky example, let’s remove the parent and leave the children in the index:

<update><delete><query>id:10</query></delete><commit/></update>

At first, It seems like everything still works. Children 11 and 12 are left in the index, but ToParentBlockJoinQuery somehow detects it and q={!parent which=’type_s:parent’}+COLOR_s:Red +SIZE_s:XL correctly returns parent 30. However after <optimize/> is executed, the deleted parent document is purged from the index and all of a sudden children 11 and 12 start to look like they belong to parent 20 The same query q={!parent which=’type_s:parent’}+COLOR_s:Red +SIZE_s:XL now returns 20 and 30, which is wrong! I’m afraid there are a few other similar cases of wrong behavior, too. As a reliable workaround I suggest sending explicit deletes by query with the implicit field _root_.

Further directions

Here are a few further desirable features in random order:

Faceting:

The Facet component for block indexes is quite useful in e-commerce. The trickiest thing it does is count SKU field values and aggregate them into product counts, as we described in an earlier post. Solr has had this capability since patch SOLR-5743.

Schema:

An application should be aware of relationships between documents while it indexes and searches. However, it might be more convenient if our search engine provides a “flat” navigation model to the front end, so the front end just refines search results by color, and the search engine figures out on its own which documents to filter and which ones to join.

Scoring Mode:

ToParentBlockJoinQuery supports several score calculation modes. {!parent} parser has None mode hardcoded.

Group Collecting:

Use [child] doctransformer, a feature added in the SOLR-5285 patch.

Many things to think about

Implementing Block Join in Solr takes more than a little work and thought, and possibly a bit of research along the way. Is it worth the effort? We think so, because more efficient search improves the customer experience, which leads to more sales in the long run. And that’s what it’s all about, isn’t it?

Notes:BlockJoin support has been available in Solr since 4.5 (Solr 3076), which was when Solr caught up with ElasticSearch in handling Nested Documents.

We also recommend reading these two articles on the subject: 2010’s Proposal for nested document support in Lucene and Searching relational content with Lucene’s BlockJoinQuery from 2012. You might also want to check this benchmark test we wrote about in our earlier blog post, High-Performance Join in Solr with BlockJoinQuery.

If you have a question about Block Join in Solr, please post a comment below or contact us via email for a prompt response.

Tags

AI-driven search and experiences

Artificial intelligence

Deep learning

Digital engagement

Retail

Abstract commerce scene with workers, carts, and parcels visualizing orchestrated agentic shopping journeys.

Article

The trust architecture: Why most agentic commerce pilots fail, and what separates the ones that don’t

Retail

Article The trust architecture: Why most agentic commerce pilots fail, and what separates the ones that don’t

The gap between a working demo and a system that survives real customers is the most expensive distance in the enterprise right now. It's also widening. Boards are writing checks for agentic commerce based on demos that won't last a week against actual shoppers. The receipts are already in. Air...

Retail

Article

Shift auto parts search into high gear with Google Cloud and Grid Dynamics

Automotive

Article Shift auto parts search into high gear with Google Cloud and Grid Dynamics

Auto parts e-commerce is booming, but complexity risks revenue. Think fitment accuracy, interchange precision, catalog and PDP content standardization, and omnichannel expectations. One misfit leads to a lost sale, and can even jeopardize customer safety. Auto parts search is in a dif...

Automotive

Isometric visualization of AI-powered data flows connecting enterprise product catalog systems

Article

Six reasons your product catalog needs a makeover in 2026—and how to get it right

Retail

Article Six reasons your product catalog needs a makeover in 2026—and how to get it right

Once upon a time, your enterprise product catalog was a backend concern. A necessary system of record. Something teams updated quietly while the real “experience” work happened elsewhere. Today, that separation no longer exists. Research shows that 87% of shoppers rate product data as “extremely...

Retail

Distributed computing infrastructure with interconnected blocks and data streams in red, green, and amber, representing the hybrid deep learning architecture connecting cloud-based Azure Databricks with on-premises NVIDIA DGX systems for deep learning workloads.

Article

Hybrid deep learning with Azure Databricks and on-prem NVIDIA DGX

Financial services

Article Hybrid deep learning with Azure Databricks and on-prem NVIDIA DGX

Modern enterprises increasingly rely on deep learning to power mission-critical workflows such as global demand forecasting, inventory optimization, supply chain prediction, video-based defect detection, and financial risk modeling. These workloads demonstrate rapidly increasing GPU requirements, g...

Financial services

AI demand forecasting model comparison visualization showing pixelated human figures with data blocks representing Time Series Foundation Models and predictive analytics

Article

Time-series foundation models: AI demand forecasting comparison

Manufacturing

Article Time-series foundation models: AI demand forecasting comparison

Predictive analytics is undergoing a major transformation. This AI demand forecasting model comparison reveals significant performance gaps between traditional and modern approaches. Demand forecasting has long guided decisions in retail and manufacturing, but today’s data volumes and volatility ar...

Manufacturing

Stylized shoppers and digital devices illustrating agentic payments.

Article

What the ACP vs AP2 agentic payments comparison means for you

Retail

Article What the ACP vs AP2 agentic payments comparison means for you

Agentic commerce is in the midst of a defining moment. Instead of a customer navigating a checkout flow, AI shopping agents can now autonomously purchase goods, renew subscriptions, or restock supplies, executing payments entirely on the customer’s behalf through agentic payments protocols. It’s...

Retail

Inventory management system featuring a central storefront surrounded by delivery vans, shopping carts, stacked packages, and digital screens. The scene depicts the integration of online and physical retail, logistics, and automated inventory processes, all connected within a seamless, technology-driven supply chain

Article

Beyond multichannel: The competitive edge of omnichannel order management

Retail

Article Beyond multichannel: The competitive edge of omnichannel order management

You know the feeling: you walk into a store only to find out that the product you saw online is out of stock! This is one of the most common and problematic experiences for customers who shop multichannel retail. The problem for you? Disconnected sales channels, lost income, frustrated custom...

Retail

How to use block join to improve search efficiency with nested documents in Solr

Indexing

Searching

Caveat

Further directions

Tags

You might also like

Let's talk

Thank you!

Thank you for reaching out!

Something went wrong...

CONTACTS

SECTIONS

FOLLOW US

How to use block join to improve search efficiency with nested documents in Solr

Indexing

Searching

Caveat

Further directions

Tags

You might also like

Subscribe to Grid Dynamics insights now

Let's talk

Thank you!

Thank you for reaching out!

Something went wrong...

Subscribe to Grid Dynamics
insights now