In-Stream Processing is a powerful new technology can that scan mind-boggling volumes of data coming from sensors, credit card swipes and web clicks, and find patterns of behavior that lead to actionable insights nearly instantaneously. Companies across all industries are exploring new ways of processing information in real time, and In-Stream Processing is emerging as the leading framework to enable a wide range of real-time applications.
Customers want the ease of getting started with developer-friendly, inexpensive infrastructure rapidly scale for massive production workloads as the system acquires more data sources, applications and customers, all in the same platform. We are constantly asked the same two questions:
1) What is the simplest, cheapest and fastest way to get my team up and running?
2) How to design massively-scalable and highly-available production infrastructure?
Open source community had been on the forefront of innovation in the In-Stream Processing space, with dozens of companies and thousands of developers contributing to the ever-growing array of technologies. The use of open source components assures the lowest total cost of ownership, the widest access to the developer market and the least amount of vendor lock-in.
Leading cloud providers have been working hard to integrate In-Stream Processing technologies into their offering. The customers want the cloud - for fast developer access to small infrastructure footprint and ease of scaling for production workloads.
As a part of our Blueprint program, Grid Dynamics provides a well-documented, tried-and-true reference architecture and reference implementation for an In-Stream Processing Service that is built with 100% open source components and runs on any cloud platform, absolutely free.
We’ve taken lessons learned, best practices and proven configurations from our experience in implementing large-scale In-Stream Processing systems for many customers and created a single reference architecture for a complete end-to-end blueprint for In-Stream Processing Service. It consists of 100% open source components, runs on any public cloud and scales from developer sandboxes that can be spun-up at a click of a button to always-on production configurations.
The use of our blueprint is completely free. The blueprint’s reference architecture is well-documented in a series of blog posts available at blog.griddynamics.com/topic/big-data. We are also in the process of releasing a reference implementation, soon to be available as open source binding for deployment of the complete blueprint on Amazon AWS with a push of a button.
The business opportunity: Integrating new data sources (web and browsing behavior) provided by their partners, which with proper integration would allow them to put forward offers aligned with their user's interests.
The task: Incorporate new data and identify patterns to improve understanding of customer's users and increase Click Through Rate.
Outcome: Designed and implemented a Hadoop-based platform for storing billions of profiles. Conducted an analysis of search patterns in browsing histories to identify users with high probability to convert. Built facility for on-demand data analysis. Created reports set for downstream consumers.
Leading telecom provider
The business opportunity: Operational reporting were both time consuming and prone to errors due to the high number of distributed data sources. Substantial employee effort, the high risk of error and the significant time lag between when the data arrived and when reports were produced, negatively impacted the business.
The task: Create a timely and accurate reporting system to provide insight for improved business operations.
Outcome: Designed and implemented a Hadoop/Hive-based data warehouse for historical and ongoing call records. Cleaned up, enriched, and prepared the data for exploration and visualization.
Digital ad agency
The business opportunity: One of biggest headaches for advertisers on the Internet is fake traffic. A robot "sees" an impression but "it" definitely won't buy anything, wasting advertisers' money. These robots are a problem and having timely identification of fraudulent impressions significantly increases ad efficiency.
The task: Architect an In-Stream Processing Engine in order to detect and eliminate false impressions in real time.
Outcome: We designed and deployed In-Stream Processing infrastructure and then implemented models, designed by a customer's Data Science team, at-scale. Millions of events per second were handled by the resulting solution.
Data management company
The business opportunity: Collect online information from a broad partnership network, correlate and analyze same in order to build user profiles in order to allow retailers, advertisers and other digital companies to deliver relevant customer experiences.
The task: Rearchitect a private data center for integration to the cloud. To enable integration, scalability and future-proof the platform.
Outcome: Data processing pipeline was split into several phases, the first one responsible for the initial data collection from different sources and integration was moved to the cloud infrastructure. This project has cloud enabled future phases of the data processing pipeline.
To provide engineering teams with pre-made, self-deployable cloud infrastructure in order to develop and test real time in-stream processing applications. At the same time, enabling operations teams to deploy, operate and grow enterprise-grade production infrastructure. Our design goals for the blueprint are as follows:
Grid Dynamics is here to help with architecture, design, implementation or operational support of In-Stream Processing platforms.
Our services cover the full lifecycle of In-Stream Processing platforms, including:
Our professional services also extend to production support for the open source components used within the Blueprint, those being Apache Kafka, Apache Spark Streaming, Redis and Apache Cassandra.