Select
Stream processing Data governance ML platform Analytical data platform Cloud-native data platform Big data DevOps

Creating predictive models and deploying them to production is an essential function for data science teams. Given the various tools, resources and integrations needed to create data pipelines and manage every step of the predictive model development process, this can be challenging. Data scientists need to have control over a lot of resources to successfully build models and put them into production. They also have to work with various tools in different stages of development that must integrate and run in the same ecosystem. Additionally, DevOps teams must be able to access the work of data scientists in order to deploy models to production.

The complexity of managing all these various tools and processes prevents the fast creation of data pipelines, slowing production as a whole.  

Since 2010, Grid Dynamics has worked with machine learning in an effort to help our customers with predictive models, recommendation engines and visual search. Our expertise in DevOps and building modern infrastructure platforms has given us the experience needed to piece all the tools, processes and workflows of the data science process together.

SRE practices help us build robust and resilient business-oriented services, reducing operational cost and increasing ROI. We have developed top talent in the field of machine learning for DevOps, and some of our employees have even written books on the subject.

Experience as DevOps experts and machine learning pioneers

Kubernetes-based platform

A machine learning platform can deploy and reliably run complex machine and deep learning workloads (TensorFlow, PyTorch, Keras, Scikit, Chainer, etc.). It supports various data processing and machine learning tools as a native platform, or as a container orchestrator. This gives us the ability to utilize a powerful, universal computation platform without the compromises of a classical data processing platform like Hadoop.

Having real scalability without losing built-in resilience allows the use of near-zero resources when the ML pipeline is not running. Therefore, businesses can pay for scalability as they need it without suffering from constantly running HDFS nodes.

Easier data analysis

Data scientists can conveniently analyze data and prototype models using Jupyterhub. This and a plethora of supporting toolsets (Data Version Control, Featuretools, etc.) creates the perfect sandbox for development and experiments.

undefined

Complex workflows and data pipelines

The platform utilizes native Kubernetes capabilities to run complex workflows and data pipelines based on the Argo framework, but can be easily integrated with native schedulers and workflow managers such as AirFlow or GCP Composer.

Models are deployed into production and scaled with TensorFlow Serving, Kubeflow (Seldon) or Clipper. Another option is creating a serving service and getting all quality attributes (resilience, elasticity, high availability and security) out-of-the box.

Cloud agnostic architecture

Cloud agnostic architecture

Kubernetes is a cloud agnostic, general computation platform that gives management capabilities out-of-the-box and works with many common technologies.
Containerized ML application packaging

Containerized ML application packaging

Docker technologies and Docker images are used to stitch together data processing, machine learning platforms and engineers' sandbox silos.
Supports mainstream ML tools

Supports mainstream ML tools

The ML platform provides similar functionalities as common tools like Scikit-learn, TensorFlow, XBBoost as well as many extras. It also enables integration with other tools like MLFlow, Clipper, and more.
Onboarding for niche and emerging ML tools

Onboarding for niche and emerging ML tools

Using an ML platform avoids lock-in to tools, as any applications or services can be run as Kubernetes extensions (Custom Resources), or as services on top of Kubernetes using ksonnet and Helm with Docker images.
CI/CD integration for full life cycle of the model

CI/CD integration for full life cycle of the model

CI/CD functionality is supported out-of-the-box using Argo as a Kubernetes Custom Resource. This matches up well with services such as Spinnaker, so there is no need to integrate with an external CI/CD system.
Kubernetes native workflows for data pipelines

Kubernetes native workflows for data pipelines

Argo is primarily used to build and manage DAGs like ML workflow as a Kubernetes Custom Resource. In addition to services, Kubernetes can manage batch and CI workloads, and replace failed containers.
Scalable model serving

Scalable model serving

With an ML platform, an application can be scaled up and down with a simple command, with a UI, or automatically, based on CPU usage. This is natively supported by Kubernetes with all needed services on top of it, such as service discovery and load balancing, storage orchestration and automated rollouts and rollbacks.
Efficient use of GPU resources

Efficient use of GPU resources

GPUs for ML and deep learning are expensive and inefficient, as they become underutilized after the active phase of development. Kubernetes resources pools are used to create GPU based nodes when they are required, and release them when they are not. The resource pools can be shared across a few jobs, further increasing efficiency.
Convenient monitoring

Convenient monitoring

Kubernetes provides excellent integration with third-party monitoring systems. To improve observability, ML systems are built with Prometheus support, or integrated with third-party monitoring systems.
Business continuity and disaster recovery

Business continuity and disaster recovery

For BCDR, the ML Platform (Kubernetes) provides functionalities such as automated rollouts and rollbacks required to continue business, even in case of disasters. A self-healing functionality can be provided as well, but requires cross-regional installation.
Solution reference architecture
Data analysis tool
Cluster computing framework
Machine learning library for Python that supports NLP
Library for Python that supports classification, regression and clustering
Software library for machine learning applications like neural networks
High-level neural networks API
Deep learning framework
Automating application deployment, development and scaling
Framework for monitoring status, availability and reliability of services
Workflow management system
Manages workflow orchestration
Container orchestration
Deploy and manage ML platform on Kubernetes
Makes ML models shareable and reproducible
Framework for automatic feature generation
Data analysis tool
Cluster computing framework
Machine learning library for Python that supports NLP
Library for Python that supports classification, regression and clustering
Software library for machine learning applications like neural networks
High-level neural networks API
Deep learning framework
Automating application deployment, development and scaling
Framework for monitoring status, availability and reliability of services
Workflow management system
Manages workflow orchestration
Container orchestration
Deploy and manage ML platform on Kubernetes
Makes ML models shareable and reproducible
Framework for automatic feature generation

Schedule a free workshop with one of our senior architects to learn more

This field is requiredPlease enter your name
This field is requiredPlease enter your email
This field is requiredPlease enter company name