Reduce time to market and cost with a cloud-native stack
When choosing a data platform to implement, the open source Hadoop-based technology stack remains the best choice, but can be difficult to deploy, configure and support. Using cloud-native deployments like Amazon AWS, Google Cloud or Microsoft Azure can greatly reduce the complexity cost and time to market associated with a Hadoop implementation.
In addition to Hadoop, all major cloud providers offer end-to-end technology stacks for data warehouses, streaming and messaging, machine learning platforms and productivity tools. Additionally, many of these offerings do not cause strong vendor lock-in, since they are based on familiar open source technologies. Overall, these benefits are prompting many companies to pursue cloud adoption at full speed, especially when building modern data platforms.
Pioneers in cloud-native data platforms
We started implementing Hadoop-based data lakes in cloud environments before hosted solutions became popular. As soon as Amazon released EMR, we used it to minimize implementation and maintenance costs for our clients, helping reduce time-to-market. This is especially important for companies beginning their journey towards building data lakes and analytical data platforms.
Since then, we've used data lake implementations from Amazon, Google and Azure to implement reliable data lakes, data pipelines, data warehouses, machine learning platforms and end-to-end analytical data platforms.
Top cloud-native platform choices
Amazon AWS
The cornerstone of Amazon's data platform offering is EMR, which is based on the Hadoop stack and covers data lakes and data processing pipelines. The ingress can be done with Amazon Kinesis for streaming, and S3 storage for batch, while orchestration is implemented with Amazon Glue. The top choices for analytics and EDW are Amazon RedShift and Amazon Athena. The machine learning platform is based on Amazon SageMaker, which provides a good set of tools for the data science process.
Google Cloud
The central component of Google data platform is Dataproc, which is based on the Hadoop stack. Alternatively, Google provides Dataflow, which is based on Apache Beam and offers a convenient unified interface for both streaming and batch data processing pipelines. The ingress can be implemented with Google PubSub for streaming and Google Storage for batch. Orchestration is implemented with Google Composer, which is based on Apache Airflow. The top choice for analytics and EDW is the powerful BigQuery, which in some cases may replace the data lake as well. The machine learning platform is supported by a variety of tools, with the Google ML engine taking a central role.
Microsoft Azure
Microsoft cloud offers two primary choices to implement a data platform. One of them is Azure Data Lake, consisting of the storage, analytics and HDInsight data pipeline components. As an alternative to the HDInsight Hadoop-based stack, Microsoft also offers the Spark-based Databricks, which can be a cornerstone of the Azure-based data platform. Ingress can be implemented with an Event Hub. The top choice for analytics and EDW is the Azure SQL Data Warehouse, while the data science process is supported with the Azure Machine Learning platform.
Key features

Fast time-to-market

High scalability

Stable operations

Pluggable architecture

End-to-end coverage

Reliable and secure
How it works

Technology stack
Our engagement model
When helping clients build cloud-native data platforms, we first want to understand whether it's a new development or a migration of an existing platform to the cloud. The client may want to migrate their existing data platform based on open source technologies, migrate from a legacy platform based on proprietary products, implement a new cloud-native platform from scratch, or implement a specific capability on cloud-native stack.
An engagement typically starts with an architecture and design phase, during which we analyze the current state of the platform, create a target architecture, and decide upon a detailed migration plan with work estimates. Alternatively, we may start with a PoC to implement a specific use case. After the initial phase is done and the improvement plan is prepared, we proceed with the phases of implementation, which we deliver in an Agile way.