Home Insights Articles Stamp session-based recommender

Stamp session-based recommender

Alexander Kuzmenko

Konstantin Speransky

Mar 10, 2022 • 6 min read

Table of Contents

The challenge
The solution
The training pipeline
The serving application
What to do about data sparsity?
Real-time and non-real-time rules
The results

In an era where customer experience and product discovery innovations are a top priority for businesses, we are witnessing a shift from traditional recommendation systems towards session-based recommendation systems in the digital commerce sector. Why? Because they produce personalized recommendations by taking into account a user’s most recent and real-time interactions with a brand, as opposed to historical behavior only.

Traditional personalized recommender systems tend to use all user-to-item interactions to learn the preferences of each user. However, not all historical events are equally important to the current recommendation scenario. The user intent may vary significantly in different sessions and even long-term preferences usually shift over time making the recommendations obsolete. Session-based recommender systems take products in user sessions as input and generate recommendations that reflect the current user intent. Also, such recommender systems address the problem of data sparsity for users, which makes it very challenging to build reliable user profiles.

The challenge

Our forward-looking Fortune 500 customer engaged Grid Dynamics to enhance the quality of their personalized recommendations with a session-based recommender. The customer was hungry to try out novel approaches employing neural networks to decide if money gains outweigh the complications in applying them in a production environment. They required a system that works with implicit feedback consisting of positive-only events like views and purchases, rather than explicit user preferences like star ratings. The customer had recently moved to Google Сloud infrastructure so the new solution also had to be cloud-native.

The ecommerce website of our customer has many recommendation zones and they require different types of recommendations. For example, it was observed that users prefer recommendations on the product details page that are heavily biased towards the anchor product. However, on the user homepage we don’t have an anchor product so recommendations are based only on the user history. At first, our customer wanted to update recommendations in the most impactful zone on the product details page so we concentrated on this scenario.

It is extremely important to keep the right balance between different metrics that the new recommender system will try to optimize. On the one hand, our customer wanted to increase the click-through rate for zones with recommendations. On the other hand, it was necessary to keep the conversion rate in check so the revenue per customer grows. Also, the recommendations should be diverse, explore the long-tail of products and to some extent surprise the users.

Another challenge was that the full catalog of our customer contains hundreds of thousands of products and it’s extremely complicated for a single model to cover the entire catalog since the user data was sparse. Further, the new recommender system needed a fast enough model under the hood so the real-time inference is possible, and it would need to run in the cloud with a latency of no more than than several hundred milliseconds. During this time, it would be necessary to get the recent user events, run model inference with these events as input and apply egress business rules.

The solution

The first step we took was to review the literature and decide on the base model. We were on the hunt for a session-based neural network model that was not very computationally complex in order for us to run the training pipeline daily and achieve fast real-time inference. The initial purpose was to deploy the new recommender system on the product details page so we were looking for a model that pays special attention to the anchor product. After carefully reviewing the approaches, we selected STAMP: Short-Term Attention/Memory Priority Model for Session-based Recommendation as the model to try. This model has an attention layer and accepts the input sequence of products while having a separate path for the anchor product. STAMP was successfully applied by Home Depot and Zalando which is reflected in corresponding papers published by these companies.

At its core, the STAMP model (see Fig. 1) has an attention layer which learns how to transform trainable product embeddings which constitute a user session into a compressed representation of a session \(m_a\). An anchor product \(x_t\) is treated through a special path in the model, and as a result, recommended products are biased towards it which is beneficial in many scenarios like recommendations on the product details page. The goal of the model training is to learn which product out of all candidate products \(V\) will be the next event given a user session and an anchor product.

The system we built consists of two parts implemented using the Google Cloud Platform: the training pipeline and the serving application.

The training pipeline

The training pipeline runs in Kubeflow and consists of the following standard steps:

raw data validation;
data preparation;
model training;
model validation;
uploading the model to the model store; and
updating the serving application.

Because our customer has hundreds of gigabytes of clickstream data generated monthly, to speed up the pipeline, we offloaded most of the data preparation logic to Google BigQuery and used GPUs for model training.

The serving application

The serving application was implemented as a cloud-based application with auto-scaling, so new instances of the serving application are created automatically to handle the load elastically. In the serving application it is necessary to get the most recent user events for STAMP to produce personalized recommendations. Therefore, all real-time user events are captured in Google BigTable, which is used by multiple recommenders, and for each incoming request, the serving application issues a call to BigTable and extracts real-time events for the inference.

What to do about data sparsity?

To address the issue of data sparsity we clustered the product catalog and had a separate STAMP model in each cluster. The serving application loads multiple STAMP models and selects which one to use based on the anchor product in the request.

The clustering logic was based on product attribute groups. We trained a single model on the full catalog and then used hierarchical clustering based on the distance calculated as a number of cross recommendations between attribute groups.

The clustering based on product attribute groups was implemented in several steps. At first the distance matrix was calculated based on a number of cross recommendations between product attribute groups. Then the distance matrix was symmetrized and only large product attribute groups were clustered using hierarchical clustering. Clustering only large groups with many products ensures the cleaner structure of resulting clusters. Finally, large product attribute groups were frozen in clusters and all other product attribute groups were clustered.

Real-time and non-real-time rules

To tailor recommendations to different scenarios we implemented a customizable set of real-time and non real-time business rules. Non real-time business rules like limiting the set of products that recommendations can come from to only available products are enacted in the training pipeline. Real-time business rules are applied on a per request basis so one set of rules can be applied in one zone and another set of rules in another zone. An example of a real-time rule would be keeping only a single product from any given product collection in the list of recommendations.

The results

The developed STAMP recommender system was A/B tested in the product details page against the current production recommender system and showed the following lift in metrics:

>10% and >15% for the desktop and mobile web click-through rate respectively;
>3% and >1% for the desktop and mobile web revenue per visitor respectively.

Later, STAMP was tested against Google Recommendations AI and also demonstrated better or comparable production metrics in different scenarios. STAMP showed a good balance between optimizing the click-through rate and the conversion rate which resulted in increased revenue per customer.

The project was developed in three subsequent steps:

At first we developed a prototype for a single product category and tested it against the present recommender system to make sure that the model is viable.
Then we prepared the code for training and serving several STAMP models, and ran an A/B test on several product categories.
Finally, we clustered all the products and implemented the recommender system that covers the full catalog.

Production A/B tests may take several weeks to complete and it is very costly to test all the model modifications, so a library for offline testing of personalized recommenders was developed:

All modifications were tested offline before staging the most promising variants for A/B testing.
Optimizations to the training pipeline allowed the daily training for the full catalog to be done in under 4 hours.
Models for all clusters are trained in parallel before being uploaded to the model store.

The developed serving application is fast: the max load that was served in production by the auto-scaled application was more than 1000 requests per second. Many parameters of the model inference, like the number of user events to use and business rules to apply, are parts of the request to the serving application. As a result, we can perform training only once and then deploy the recommender system to various zones by issuing requests with different parameters.

Now, the STAMP recommender system serves 100% of traffic in one of the zones on the product details page and the modifications are underway to improve the model. We are working on further enhancements to the model to address the product cold-start problem and to optimize the model for specific business metrics.

Interested in learning more about session-based recommendations? Get in touch with us to start a discussion.

Tags

AI-driven search and experiences

Artificial intelligence

Customer experience

Digital engagement

Retail

Abstract commerce scene with workers, carts, and parcels visualizing orchestrated agentic shopping journeys.

Article

The trust architecture: Why most agentic commerce pilots fail, and what separates the ones that don’t

Retail

Article The trust architecture: Why most agentic commerce pilots fail, and what separates the ones that don’t

The gap between a working demo and a system that survives real customers is the most expensive distance in the enterprise right now. It's also widening. Boards are writing checks for agentic commerce based on demos that won't last a week against actual shoppers. The receipts are already in. Air...

Retail

Article

Shift auto parts search into high gear with Google Cloud and Grid Dynamics

Automotive

Article Shift auto parts search into high gear with Google Cloud and Grid Dynamics

Auto parts e-commerce is booming, but complexity risks revenue. Think fitment accuracy, interchange precision, catalog and PDP content standardization, and omnichannel expectations. One misfit leads to a lost sale, and can even jeopardize customer safety. Auto parts search is in a dif...

Automotive

Isometric visualization of AI-powered data flows connecting enterprise product catalog systems

Article

Six reasons your product catalog needs a makeover in 2026—and how to get it right

Retail

Article Six reasons your product catalog needs a makeover in 2026—and how to get it right

Once upon a time, your enterprise product catalog was a backend concern. A necessary system of record. Something teams updated quietly while the real “experience” work happened elsewhere. Today, that separation no longer exists. Research shows that 87% of shoppers rate product data as “extremely...

Retail

Distributed computing infrastructure with interconnected blocks and data streams in red, green, and amber, representing the hybrid deep learning architecture connecting cloud-based Azure Databricks with on-premises NVIDIA DGX systems for deep learning workloads.

Article

Hybrid deep learning with Azure Databricks and on-prem NVIDIA DGX

Financial services

Article Hybrid deep learning with Azure Databricks and on-prem NVIDIA DGX

Modern enterprises increasingly rely on deep learning to power mission-critical workflows such as global demand forecasting, inventory optimization, supply chain prediction, video-based defect detection, and financial risk modeling. These workloads demonstrate rapidly increasing GPU requirements, g...

Financial services

AI demand forecasting model comparison visualization showing pixelated human figures with data blocks representing Time Series Foundation Models and predictive analytics

Article

Time-series foundation models: AI demand forecasting comparison

Manufacturing

Article Time-series foundation models: AI demand forecasting comparison

Predictive analytics is undergoing a major transformation. This AI demand forecasting model comparison reveals significant performance gaps between traditional and modern approaches. Demand forecasting has long guided decisions in retail and manufacturing, but today’s data volumes and volatility ar...

Manufacturing

Stylized shoppers and digital devices illustrating agentic payments.

Article

What the ACP vs AP2 agentic payments comparison means for you

Retail

Article What the ACP vs AP2 agentic payments comparison means for you

Agentic commerce is in the midst of a defining moment. Instead of a customer navigating a checkout flow, AI shopping agents can now autonomously purchase goods, renew subscriptions, or restock supplies, executing payments entirely on the customer’s behalf through agentic payments protocols. It’s...

Retail

Inventory management system featuring a central storefront surrounded by delivery vans, shopping carts, stacked packages, and digital screens. The scene depicts the integration of online and physical retail, logistics, and automated inventory processes, all connected within a seamless, technology-driven supply chain

Article

Beyond multichannel: The competitive edge of omnichannel order management

Retail

Article Beyond multichannel: The competitive edge of omnichannel order management

You know the feeling: you walk into a store only to find out that the product you saw online is out of stock! This is one of the most common and problematic experiences for customers who shop multichannel retail. The problem for you? Disconnected sales channels, lost income, frustrated custom...

Retail

Stamp session-based recommender

The challenge

The solution

The training pipeline

The serving application

What to do about data sparsity?

Real-time and non-real-time rules

The results

Tags

You might also like

Let's talk

Thank you!

Thank you for reaching out!

Something went wrong...

CONTACTS

SECTIONS

FOLLOW US

Stamp session-based recommender

The challenge

The solution

The training pipeline

The serving application

What to do about data sparsity?

Real-time and non-real-time rules

The results

Tags

You might also like

Subscribe to Grid Dynamics insights now

Let's talk

Thank you!

Thank you for reaching out!

Something went wrong...

Subscribe to Grid Dynamics
insights now