Home Insights Articles Modern serverless data ingestion solution on AWS

Modern serverless data ingestion solution on AWS

The word data formed with small black dots on a glass surface

Introduction

A meal kit company that specializes in delivering healthy, organic, ready meal kits was about to start its journey to the cloud. The company was recently acquired by a leader in the food and beverage industry who intended to merge the business under one umbrella, with the same brand, customer management, operations and technologies, all managed by the parent company.  The Client engaged Grid Dynamics to integrate data into their ecosystem through the development of an effective data ingestion solution that provides data model reconciliation and data backfill.

The challenge

During the pandemic, the client grew substantially, released to new markets, and made several acquisitions, leading to the need for a new approach to manage business, run operations and maintain technical solutions. Due to this tremendous business growth, consistent operations improvements were required to compete in the market. Multiple IT operations, platform solutions, technical departments and integrations across acquisitions made it hard to manage a sophisticated technical landscape.  

Grid Dynamics had a specific focus on integrating the acquired business into the parent technological ecosystem. The biggest challenge of any acquisition is merging businesses that have a greater number of different components than common components. Unification of business processes for this client involved:

  1. Unification of customer audience;
  2. Unification of marketing strategies: building a marketing strategy for each brand complimentary to other brands; and
  3. Technical architecture and solutions unification.

Further considerations for integrating acquisitions into the parent architecture included:

  1. An integration strategy for different technical stacks;
  2. Recommendations on how the parent architecture should be adopted in order to expose the integration API; and
  3. A data management strategy.

The rest of the derived use cases, like unified customer 360, marketing campaigns, customer acquisition and retention policies, were out of scope for the engagement.

For this case study, we’ll focus on the unification of the technical architecture, including the approaches we used, and the solutions we built on top of AWS. We also tackle the other major goal of the integration, which was to create a defined technical roadmap for future acquisition integrations.

With these defined requirements, Grid Dynamics developed a lightweight solution hosted on AWS. Below we explain why certain AWS services were beneficial for this particular integration use case.

Solution expectations

At the beginning of our engagement, the client was running an on-premise platform, with  some infrastructure components migrated to AWS. Coming from an on-premise world, where supporting hardware, infrastructure, services and applications is a prerequisite, the client wanted to build a serverless platform that required close to zero infrastructure support.

Serverless considerations

Integration between the two businesses required data transformation and exposure to the parent company. While considering the serverless approach we would take, AWS Glue as a serverless data integration service stood out for its features that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. Furthermore, AWS Glue provides all the capabilities needed for data integration out of the box, enabling greater speed to market.

There are three AWS Glue components:

  • AWS Glue Data Catalog – This is basically a central repository for your metadata, built to hold information in metadata tables, with each table pointing to a single data store. In other words, it acts as an index to your data schema, location, and runtime metrics, which are then used to identify the targets and sources of your ETL (Extract, Transform, Load) jobs.
  • Job Scheduling System – The job scheduling system, on the other hand, is intended to help you automate and chain your ETL pipelines. It comes in the form of a flexible scheduler that’s capable of setting up event-based triggers and job execution schedules.
  • ETL Engine – AWS Glue’s ETL engine is the one component that handles ETL code generation. It automatically provides this in Python or Scala, and also gives you the option of customizing the code.

Architectural solution

Grid Dynamics opted for a solution based on AWS serverless to help the client achieve their data integration goal fast. Using serverless scalable services like AWS Glue and AWS Redshift enabled us to optimize operating costs and development expenses.

The Analytics Platform that we built, based on AWS Glue capabilities, involved data ingestion from MongoDB to the data lake with an intermediate data lake in AWS S3. For ingestion and transformation, Glue ETL services based on Apache Spark were used. To meet best practices, the intermediate data lake was split into several logical layers:

  • S3 Landing Zone – a place that contains the source data as is, with no transformations
  • S3 Consumption Zone – a place that contains transformed landing data to corresponding data models. It contains ready-to-use data for analytics.  

The data ingestion process can be summarized as follows:

  • All data in the intermediate data lake were categorized by AWS DataCatalog. If needed, data is accessible using AWS Athena.
  • The data ingestion pipeline writes the final data to AWS S3, AWS RDS PostgreSQL and AWS Redshift.
  • The data ingestion pipeline is triggered by AWS Glue Workflow services to create and visualize complex ETL activities involving multiple crawlers, jobs and triggers.
  • Finally, the needed credentials for services intercommunication are stored in AWS Secrets Manager.
Proposed Solution & Architecture
Proposed Solution & Architecture

The results

The project timeline was aggressive: the integration needed to go live within three months, including production infrastructure, pipelines, data quality, monitoring and support runbooks. Grid Dynamics completed the project within the timeline, providing the client with:

  1. Fully automated infrastructure provisioning;
  2. CI/CD and version control;
  3. Data ingestion and transformation pipelines;
  4. Data quality checks and schema enforcement;
  5. Data catalog and self service access to the data.

The solution was built on top of serverless components of the AWS cloud, and since all data pipelines are batch in nature, there is no need to run infrastructure constantly – all services are provisioned on demand and released after pipeline completion. This approach resulted in drastic infrastructure cost reductions, no more infrastructure support engineers, and greater scalability as the client grows.

Tags

You might also like

Exploding agent head with knowledge and user interfaces to represent adaptive UI validation
Article
AI agents are assembling adaptive UI. Here’s how validation needs to evolve.
Article AI agents are assembling adaptive UI. Here’s how validation needs to evolve.

User interfaces are no longer static. The industry is shifting toward adaptive systems where the interface is assembled at runtime. For decades, software was designed around fixed surfaces: a nav here, a hero there, content slots predefined by a designer. Users learned the interface. However, th...

Surreal portrait of a woman with headphones amid data and cloud motifs, illustrating AI-powered modernization.
Article
Enterprise AI modernization as a daily operating model
Article Enterprise AI modernization as a daily operating model

What does AI-powered modernization as a daily operating model look like? On Monday morning, your teams do not start by opening an incident queue. They start by reviewing a set of pull requests produced overnight by software agents focused on modernization. Each pull request is small. Each is tested...

EU AI Act compliance checklist with abstract red and blue background
Article
Are your UI application development processes compliant with the EU AI Act?
Article Are your UI application development processes compliant with the EU AI Act?

As of February 2026, the European Union Artificial Intelligence Act (AI Act) has transitioned from a legislative draft to the primary regulatory framework for software engineering in the EU. This landmark legislation is no longer a distant prospect; with prohibitions on unacceptable risks already i...

Conceptual image of a person surrounded by floating device screens, representing AI agents for UI design safely generating consistent user interfaces across web and mobile apps.
Article
AI agent for UI design: A safer way to generate interfaces
Article AI agent for UI design: A safer way to generate interfaces

Enterprise AI agents are increasingly used to assist users across applications, from booking flights to managing approvals and generating dashboards. An AI agent for UI design takes this further by generating interactive layouts, forms, and controls that users can click and submit, instead of just...

Spiral nodes against black background representing the WAVE framework for SDLC automation
Article
How AI brings a new WAVE of transformation to SDLC automation
Article How AI brings a new WAVE of transformation to SDLC automation

Today, agentic AI can autonomously build, test, and deploy full-stack application components, unlocking new levels of speed and intelligence in SDLC automation. A recent study found that 60% of DevOps teams leveraging AI report productivity gains, 47% see cost savings, and 42% note improvements in...

Multi-layered AI engineering advisor dashboard
Article
Solve the developer productivity paradox with Grid Dynamics’ AI-powered engineering advisor
Article Solve the developer productivity paradox with Grid Dynamics’ AI-powered engineering advisor

Today, many organizations find themselves grappling with the developer productivity paradox. Research shows that software developers lose more than a full day of productive work every week to systemic inefficiencies, potentially costing organizations with 500 developers an estimated $6.9 million an...

Vibrant translucent cubes and silhouettes of people in a digital cityscape, visually representing the dynamic and layered nature of AI software development, where diverse technologies, data, and human collaboration intersect to build innovative, interconnected digital solutions
Article
Your centralized command center for managing AI-native development
Article Your centralized command center for managing AI-native development

Fortune 1000 enterprises are at a critical inflection point. Competitors adopting AI software development are accelerating time-to-market, reducing costs, and delivering innovation at unprecedented speed. The question isn’t if you should adopt AI-powered development, it’s how quickly and effectivel...

Let's talk

    This field is required.
    This field is required.
    This field is required.
    By sharing, I consent to the use or processing of my personal information by Grid Dynamics for the purpose of fulfilling this request and in accordance with Grid Dynamics’s Privacy Policy. For more details about how to opt-out, please refer to the Privacy Policy and Terms & Conditions.
    Submitting
    quote icon

    We consistently turn to Grid Dynamics for our most complex challenges. Their Data Scientists and AI Engineers are top-notch—highly experienced and deeply knowledgeable.

    Sr. Engineering Director, global auto parts retailer

    Geometric composition with teal car wheel

    Thank you!

    It is very important to be in touch with you.
    We will get back to you soon. Have a great day!

    check

    Thank you for reaching out!

    We value your time and our team will be in touch soon.

    check

    Something went wrong...

    There are possible difficulties with connection or other issues.
    Please try again after some time.

    Retry