Home Insights Articles The segmented filter cache and block join query parser in Solr

The segmented filter cache and block join query parser in Solr

Mikhail Khludnev

Aug 16, 2016 • 3 min read

The segmented filter cache and Block Join Query Parser in Solr

Table of Contents

NRT-filters
OR filters
Filters 2.0
NRT-facets

The “law of unintended consequences” applies to using the block join query parser in Solr, just as it does to many other things in life (and software). Leave out certain query strings in Solr, and It seems to make no difference. But this action can actually have positive effects, especially when working with Solr in a Near Real Time (NRT) environment. There are a number of other steps you can take to make Solr more NRT-capable, too.

Let’s look at an accidental finding about the block-join query parser in Solr. What query do you think this parser yields if you omit a query string such as q={!parent which=’type_s:parent’}? It might not seem obvious, but it yields the same parent filter (type_s:parent) from perSegFilter cache as it does when you include the query string. The initial intention for this code branch was to expose a parent bitset to users who wanted to reuse it as Solr’s filter query. It turns out that it can solve a filter cache regeneration issue and, therefore, make Solr more NRT-friendly (Near Real Time).

We can start with the caching basis. If you specify fq=SIZE:XL in request params, Solr will create an on-heap bitset on top of all segments and will use it as a filter in a very efficient manner. However, when you perform a commit (no matter whether it’s hard or soft), this bitset gets scratched and you are faced with a slowdown either at commit time, when filter bitsets are regenerated or at query time, when unlucky ‘cold’ requests have to regenerate those bitsets. Such pauses make Solr not really NRT- friendly. If you are dealing with such commit pauses and/or have to commit frequently, read on. Otherwise, you can consider this an unusual Solr use case.

NRT-filters

To get rid of these pauses, try to rewrite fq=SIZE:XL to fq={!parent which=’SIZE:XL’}. Also, make sure that perSegFilter has the proper size and has NoOpRegenerator specified. Now filters shouldn’t slow down searches on commit nor should the commits themselves. To make sure this works as expected, look at cache entries by enabling cache introspection. That’s what you should see in the perSegFilter dropdown in SolrAdmin:item_SIZE:XL: FixedBitSetCachingWrapperFilter(QueryWrapperFilter(SIZE:XL))

Make sure that there is no hit in filterCache while you experiment with these filters.

It’s worth mentioning that intersecting such filters (when you specify several fqs) is not as efficient at comparison as plain Solr fq, which uses bitwise and eight-byte words.

Another drawback of this hack is that it uses memory-wasteful plain bitsets (like Solr fq), rather than a more compact one.

OR filters

One of the questions which regularly hits the mailing lists is about disjunction of cached filters; i.e. if fq=SIZE:L and fq=SIZE:M are cached as two separate cache entries, can’t we reuse these bitsets in disjunction filter fq=SIZE:L OR SIZE:M and avoid caching them separately? Yes, we can: fq={!cache=false}{!parent which=’SIZE:L’} OR {!parent which=’SIZE:M’}.

In addition to the cache introspection mentioned in the previous paragraph, you can check that you do it right by placing this string to q= param and requesting debugQuery=true, you something like this:

{!cache=false}{!cache=false}ConstantScore(FixedBitSetCachingWrapperFilter(QueryWrapperFilter(SIZE:L))) {!cache=false} cache=false}ConstantScore(FixedBitSetCachingWrapperFilter(QueryWrapperFilter(SIZE:M)))

Here you can see the non-cached disjunction of two filters cached in perSegFilters. The last two notes from the previous paragraph (about inefficient combining and storing) are applicable here as well.

Filters 2.0

Note that all this dancing around filters is about using a heap to cache the postings list. Providing that most times a postings list file is mmaped according to this great advice, how much sense is in it? The reason for caching is the postings on-disk format, which is CPU-intensive while decoding on reading. This format also stores some scoring necessary data like tf which is not needed for filtering; also, Solr’s filters use the bitwise operation for an intersection that usually gets some gain. Thus, we can think about a specialized bitset codec as a feature of filters. There is a modest patch that should help this approach.

NRT-facets

What else makes Solr unfriendly to NRT?[a] UnInvertedFields! What can you do with them? If you count facets on single value fields, you can use Lucene’s FieldCache by facet.method=fcs. if you deal with multivalue fields you can specify docValues for them that trigger an alternative faceting engine. DocValues facets use heap data structure (OrdinalMap) that leads to pauses similar to those caused by UnInvertedField. However, they should be much shorter.

One last note: NRT doesn’t mean better throughput in general; it just means more predictable latency — which doesn’t necessarily mean decreasing average latency.

If you have a question about Block Join in Solr, please post a comment below or contact us via email for a prompt response.

Tags

AI-driven search and experiences

Artificial intelligence

Deep learning

Digital engagement

Retail

Abstract commerce scene with workers, carts, and parcels visualizing orchestrated agentic shopping journeys.

Article

The trust architecture: Why most agentic commerce pilots fail, and what separates the ones that don’t

Retail

Article The trust architecture: Why most agentic commerce pilots fail, and what separates the ones that don’t

The gap between a working demo and a system that survives real customers is the most expensive distance in the enterprise right now. It's also widening. Boards are writing checks for agentic commerce based on demos that won't last a week against actual shoppers. The receipts are already in. Air...

Retail

Article

Shift auto parts search into high gear with Google Cloud and Grid Dynamics

Automotive

Article Shift auto parts search into high gear with Google Cloud and Grid Dynamics

Auto parts e-commerce is booming, but complexity risks revenue. Think fitment accuracy, interchange precision, catalog and PDP content standardization, and omnichannel expectations. One misfit leads to a lost sale, and can even jeopardize customer safety. Auto parts search is in a dif...

Automotive

Isometric visualization of AI-powered data flows connecting enterprise product catalog systems

Article

Six reasons your product catalog needs a makeover in 2026—and how to get it right

Retail

Article Six reasons your product catalog needs a makeover in 2026—and how to get it right

Once upon a time, your enterprise product catalog was a backend concern. A necessary system of record. Something teams updated quietly while the real “experience” work happened elsewhere. Today, that separation no longer exists. Research shows that 87% of shoppers rate product data as “extremely...

Retail

Distributed computing infrastructure with interconnected blocks and data streams in red, green, and amber, representing the hybrid deep learning architecture connecting cloud-based Azure Databricks with on-premises NVIDIA DGX systems for deep learning workloads.

Article

Hybrid deep learning with Azure Databricks and on-prem NVIDIA DGX

Financial services

Article Hybrid deep learning with Azure Databricks and on-prem NVIDIA DGX

Modern enterprises increasingly rely on deep learning to power mission-critical workflows such as global demand forecasting, inventory optimization, supply chain prediction, video-based defect detection, and financial risk modeling. These workloads demonstrate rapidly increasing GPU requirements, g...

Financial services

AI demand forecasting model comparison visualization showing pixelated human figures with data blocks representing Time Series Foundation Models and predictive analytics

Article

Time-series foundation models: AI demand forecasting comparison

Manufacturing

Article Time-series foundation models: AI demand forecasting comparison

Predictive analytics is undergoing a major transformation. This AI demand forecasting model comparison reveals significant performance gaps between traditional and modern approaches. Demand forecasting has long guided decisions in retail and manufacturing, but today’s data volumes and volatility ar...

Manufacturing

Stylized shoppers and digital devices illustrating agentic payments.

Article

What the ACP vs AP2 agentic payments comparison means for you

Retail

Article What the ACP vs AP2 agentic payments comparison means for you

Agentic commerce is in the midst of a defining moment. Instead of a customer navigating a checkout flow, AI shopping agents can now autonomously purchase goods, renew subscriptions, or restock supplies, executing payments entirely on the customer’s behalf through agentic payments protocols. It’s...

Retail

Inventory management system featuring a central storefront surrounded by delivery vans, shopping carts, stacked packages, and digital screens. The scene depicts the integration of online and physical retail, logistics, and automated inventory processes, all connected within a seamless, technology-driven supply chain

Article

Beyond multichannel: The competitive edge of omnichannel order management

Retail

Article Beyond multichannel: The competitive edge of omnichannel order management

You know the feeling: you walk into a store only to find out that the product you saw online is out of stock! This is one of the most common and problematic experiences for customers who shop multichannel retail. The problem for you? Disconnected sales channels, lost income, frustrated custom...

Retail

The segmented filter cache and block join query parser in Solr

NRT-filters

OR filters

Filters 2.0

NRT-facets

Tags

You might also like

Let's talk

Thank you!

Thank you for reaching out!

Something went wrong...

CONTACTS

SECTIONS

FOLLOW US

The segmented filter cache and block join query parser in Solr

NRT-filters

OR filters

Filters 2.0

NRT-facets

Tags

You might also like

Subscribe to Grid Dynamics insights now

Let's talk

Thank you!

Thank you for reaching out!

Something went wrong...

Subscribe to Grid Dynamics
insights now