Home Careers Discover openings Site Reliability Engineer

Site Reliability Engineer

Wroclaw, Warsaw, Poland

Applications for this job are no longer accepted. Please explore other open opportunities on our platform.

We are looking for a Site Reliability Engineer to join a high-stakes global tech ecosystem and drive the delivery of a critical enterprise platform migration to the cloud.

Your core mission will be to architect, build, and productionalize the observability and cost intelligence (FinOps) layer for a massive, multi-year financial platform transformation. You will take end-to-end ownership of the cloud platform layer, giving internal stakeholders full visibility into platform behavior, performance, and infrastructure spend. Working alongside a nearshore team of senior engineers, you will solve highly complex architectural challenges in a production-grade, distributed system.

Essential functions

Responsibilities:

End-to-End Infrastructure & FinOps Ownership: Architect and implement a cloud usage and cost attribution dashboard, providing detailed per-pod and per-service cost breakdown using cloud billing APIs and internal FinOps hubs.
Advanced Observability & Tracing: Instrument end-to-end distributed tracing using OpenTelemetry, configuring collectors within Kubernetes environments and exporting traces to cloud monitoring systems utilizing RED metrics.
Performance Engineering & Stress Testing: Write custom tooling from scratch to deliver database performance monitoring, load testing, and trend analysis for critical underlying storage layers.
Monitoring & Alerting Automation: Build and deploy scalable production monitoring, custom alerting policies, and SLO tracking for containerized and serverless services.
Infrastructure as Code: Independently manage, write, and apply infrastructure modifications using Terraform, working within established enterprise repository standards, modules, and environment state management.
Cross-Language Codebase Extension: Read, debug, and extend existing platform code across a diverse stack including Kotlin, Java, and Python to seamlessly integrate technical metrics without disrupting business logic.
Quality & Release Assurance: Implement rigorous unit testing with high code coverage for all newly developed monitoring tools to comply with strict enterprise quality gates and sign-offs.

Qualifications

Min requirements:

Experience: 4 to 6 years of professional software or DevOps engineering experience, with at least 2 to 3 years of hands-on cloud infrastructure management in production.
Advanced Cloud Infrastructure: Deep operational proficiency with Google Cloud Platform (GCP), specifically with managing and configuring workload-level alerting on Google Kubernetes Engine (GKE) and Cloud Run.
Observability & OpenTelemetry: Proven track record of building observability solutions in distributed systems, using OpenTelemetry (both auto and manual instrumentation) alongside distributed tracing and profiling tools.
Strong Automation Scripting: Intermediate-to-advanced fluency in Python for writing custom test tooling, metrics integration scripts, and backend automation from scratch.
Solid Infrastructure as Code: Strong proficiency in Terraform, including experience with multi-environment setups, workspaces, and corporate module standards.
Polyglot & JVM Familiarity: Practical ability to read, understand, and modify existing backend codebases written in Kotlin and Java.
Crucial Non-Technical Skills: Extreme technical autonomy to resolve blockers independently, rapid onboarding skills into large unfamiliar codebases, and fluent written English for async alignment and pull requests.
Process Alignment: Ability to thrive in a highly regulated enterprise environment with strict peer reviews, robust documentation requirements, and formal deployment procedures.

Would be a plus

Would be a plus:

Domain Knowledge: Previous experience working within financial services, fintech, investment banking, or other highly regulated industries.
Enterprise Streaming Tools: Working knowledge of cloud messaging systems (such as Cloud Pub/Sub) utilized for inter-service communication.
Advanced Storage Engines: Familiarity with high-throughput distributed database architectures, such as Google Cloud Bigtable.
Systems Languages Awareness: Ability to read or debug foundational code written in low-level systems languages like Rust or C++ during multi-stack production deployments.

We offer

Opportunity to work on bleeding-edge projects
Work with a highly motivated and dedicated team
Competitive salary
Flexible schedule
Benefits package - medical insurance, sports
Corporate social events
Professional development opportunities
Well-equipped office

About us

Grid Dynamics (NASDAQ: GDYN) is a leading provider of technology consulting, platform and product engineering, AI, and advanced analytics services. Fusing technical vision with business acumen, we solve the most pressing technical challenges and enable positive business outcomes for enterprise companies undergoing business transformation. A key differentiator for Grid Dynamics is our 8 years of experience and leadership in enterprise AI, supported by profound expertise and ongoing investment in data, analytics, cloud & DevOps, application modernization and customer experience. Founded in 2006, Grid Dynamics is headquartered in Silicon Valley with offices across the Americas, Europe, and India.

Apply to the position

Country of application*

Information on personal data processing

You cannot apply for a position without accepting “INFORMATION ON PERSONAL DATA PROCESSING”

Resume*

File

Invalid file size or format. DOC, DOCX, TXT, PDF (2 MB)

Social profile

First name*

Last name*

E-mail*

Phone

City of application*

Consent to the processing of personal data in future recruitment processes*

I hereby give my consent to the Grid Dynamics Group to process my personal data contained in the application documents for the purpose of using my application in future recruitment processes.

We are committed to maintaining a transparent and ethical workplace. To learn more about how we support open communication, please review our Whistleblower Policy.

Additional files

File

Invalid file size or format. DOC, DOCX, TXT, PDF (2 MB)

Type cover letter

Submitting

Applications for this job are no longer accepted. Please explore other open opportunities on our platform.

Thank you!

You applied for the position Site Reliability Engineer successfully. We will get back to you soon. Have a great day!

Something went wrong...

There are possible difficulties with connection or other issues. Please try to use another browser (it's recommended to use the latest version of Google Chrome browser). If the problem still persists, please send your application to cv@griddynamics.com

Something went wrong...

Please double-check the information filled in the form, and make sure to provide valid data.

Don’t see the right opportunity?

Grid Dynamics is an equal opportunity employer. We are committed to creating an inclusive environment for all employees during their employment and for all candidates during the application process.

All qualified applicants will receive consideration for employment without regard to, and will not be discriminated against based on, age, race, gender, color, religion, national origin, sexual orientation, gender identity, veteran status, disability or any other protected category. All employment is decided on the basis of qualifications, merit, and business need.

Grid Dynamics Privacy Policy and E-verify

Site Reliability Engineer

Responsibilities:

Min requirements:

Would be a plus:

Apply to the position

Thank you!

Something went wrong...

Something went wrong...

Don’t see the right opportunity?

CONTACTS

SECTIONS

FOLLOW US