Senior DevOps Engineer (AWS)
We are seeking a visionary Senior Engineer to lead the design and evolution of our multi-GPU, cross-cloud platforms.
Our client strive to create the world’s leading, commercially scalable, safe, and advanced humanoid robots that seamlessly integrate into daily life and amplify human capacity.
In this high-impact role, you will sit directly at the intersection of DevOps, MLOps, and distributed systems. You will own the architecture, long-term technical direction, reliability, and scaling of the model training platforms that make real-world, cutting-edge AI possible.
If you are passionate about driving infrastructure-as-code at scale, optimizing GPU-heavy workloads, and defining the future of robotic intelligence, we want you on our team!
Essential functions
- Lead the design and evolution of scalable multi-GPU infrastructure across cloud environments (AWS, GCP, etc.)
- Own architecture and long-term technical direction of model training platforms
- Drive reliability, performance, and cost-efficiency at scale
- Define and implement best practices for infrastructure, DevOps, and MLOps across the organization
- Build and evolve infrastructure-as-code and automation for provisioning, orchestration, and lifecycle management
- Architect and improve CI/CD systems for both infrastructure and ML training workflows
- Optimize distributed training workloads (scheduling, resource utilization, observability)
- Partner with ML engineers and researchers to enable efficient experimentation and productionization
- Lead troubleshooting and resolution of complex system issues across distributed, GPU-heavy environments
- Mentor engineers and raise the bar for engineering quality and operational excellence
- Document architecture, systems, and key technical decisions
Qualifications
Production-grade, hands-on infrastructure-as-code experience (Terraform)
Kubernetes application packaging and release management (Helm)
Hands-on experience operating workloads on a major cloud provider (AWS)
Building and operating CI/CD pipelines, including self-hosted build runners (GitHub Actions)
Deep hands-on experience with monitoring and alerting stacks (Prometheus/Grafana)
Core SRE skill set: Linux administration, containerization, and container orchestration
Excellent automation and scripting skills (Python and Bash)
Flexibility to participate on-call rota for the urgent issues outside of regular business hours (extra paid)
Adaptable to fast-changing priorities
Fast learner who enjoys adopting cutting-edge technology
Bias to action
Strong team player with a sense of sole ownership and accountability
Strong communicator across teams of varied backgrounds and experience
Would be a plus
Cluster autoscaling and dynamic node provisioning (Karpenter)
Operating GPU-accelerated Kubernetes clusters (NVIDIA)
Workflow orchestration platforms (Prefect)
Gang scheduling for distributed workloads
Resource allocation and fair-sharing (queues and priorities)
High-performance and shared storage systems (FSx for Lustre, EFS)
Self-hosted agentic infrastructure
We offer
Opportunity to work on bleeding-edge projects
Work with a highly motivated and dedicated team
Competitive salary
Flexible schedule & a hybrid working mode
Benefits package - medical insurance, sports
Corporate social events
Professional development & growth opportunities
Well-equipped office located down-town
About us
Grid Dynamics (NASDAQ: GDYN) is a leading provider of technology consulting, platform and product engineering, AI, and advanced analytics services. Fusing technical vision with business acumen, we solve the most pressing technical challenges and enable positive business outcomes for enterprise companies undergoing business transformation. A key differentiator for Grid Dynamics is our 8 years of experience and leadership in enterprise AI, supported by profound expertise and ongoing investment in data, analytics, cloud & DevOps, application modernization and customer experience. Founded in 2006, Grid Dynamics is headquartered in Silicon Valley with offices across the Americas, Europe, and India.Apply to the position
Thank you!
You applied for the position Senior DevOps Engineer (AWS) successfully. We will get back to you soon. Have a great day!
Something went wrong...
There are possible difficulties with connection or other issues. Please try to use another browser (it's recommended to use the latest version of Google Chrome browser). If the problem still persists, please send your application to cv@griddynamics.com
RetrySomething went wrong...
Please double-check the information filled in the form, and make sure to provide valid data.
RetryDon’t see the right opportunity?
Contact us anyway and let’s talk! To apply, send your resume and cover letter to jobs@griddynamics.com
Grid Dynamics is an equal opportunity employer. We are committed to creating an inclusive environment for all employees during their employment and for all candidates during the application process.
All qualified applicants will receive consideration for employment without regard to, and will not be discriminated against based on, age, race, gender, color, religion, national origin, sexual orientation, gender identity, veteran status, disability or any other protected category. All employment is decided on the basis of qualifications, merit, and business need.
