Home Careers Discover openings Site Reliability Engineer

Site Reliability Engineer

Hyderabad, India

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, distributed, fault-tolerant systems enabling online ordering for thousands of restaurants across multiple brands. SRE ensures that Inspire Digital Platform (IDP) services have reliability, uptime appropriate to users' needs and a fast rate of improvement. Additionally, SRE’s will keep an ever-watchful eye on our systems capacity and performance perform regular capacity planning exercise. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation.

Essential functions

Review current workload patterns, understand the business case and prioritize areas of weakness within the platform through log and metric investigation as well as application profiling.

· Work with senior engineering and testing team members to build tools and recommend testing strategies for problem prevention, detection.

· Employ deep troubleshooting skills to improve the availability, performance, and security to ensure services are designed with 24/7 availability and operational readiness and rigor.

· Perform in depth postmortem on production incidents, to assess effective business impact and for Engineering to learn from these.

· Create Dashboards and alerts for Monitoring the IDP platform, define key metrics and service level indicators and ensure relevant metric data is collected to create actionable alerts for SRE and Network Operation Center.

· Participate in the 24/7 on call rotation.

· Automate toil, by building software and automation for seamless application deployment and third-party tool integration.

· Ensure the platform holds a high degree of reliability, at least three 9s.

· Define non-functional requirements as part of the product lifecycle to influence the new designs, standards, and methods for scalable, highly available distributed systems

· own technically intricate issues that cross between DevOps, Databases, Networking, Code, Infrastructure and people; drive them to satisfactory completion.

• Provide recommendations and feedback in design reviews and review sessions.

Qualifications

AKS, API management, Azure Cache for Redis, Azure Blob Storage, Cosmo DB, Service Bus, Azure Functions, New Relic, Splunk, Prometheus, Grafana., Java, TypeScript, python.

Would be a plus

Requirements:

· Bachelor’s degree in computer science, a related field, or equivalent practical experience

· Minimum 5 years of experience as a Software Engineer, Platform, SRE or Devops engineer supporting large scale SAAS Production B2C or B2B Cloud Platforms.

· Development skills, Java, TypeScript, python, OOP expertise is a must.

· Hands on Azure Cloud experience particularly with AKS, API management, Azure Cache for Redis, Azure Blob Storage, Cosmo DB, Service Bus, Azure Functions.

· Proficiency in monitoring, APM and profiling tools, New Relic, Splunk, Prometheus, Grafana.

· Working experience with containers, Kubernetes and Helm.

· Functional knowledge of Cloud Network, Firewalls, Ingress and Egress controllers, Service Mesh and

· experience with Auth0 Secret management and Cloudflare, CDN, Load Balancer, Cache, Firewall, worker features.

· Experience with ArgoCD, GitLab, CICD, Terraform , Infrastructure as Code.

· Strong communication skills and ability to explain technical concepts clearly and simply

· A willingness to dive into understanding, debugging, and improving any layer of the stack

Responsibilities:

· Review current workload patterns, understand the business case and prioritize areas of weakness within the platform through log and metric investigation as well as application profiling.

· Work with senior engineering and testing team members to build tools and recommend testing strategies for problem prevention, detection.

· Employ deep troubleshooting skills to improve the availability, performance, and security to ensure services are designed with 24/7 availability and operational readiness and rigor.

· Perform in depth postmortem on production incidents, to assess effective business impact and for Engineering to learn from these.

· Participate in the 24/7 on call rotation.

· Automate toil, by building software and automation for seamless application deployment and third-party tool integration.

· Ensure the platform holds a high degree of reliability, at least three 9s.

· Define non-functional requirements as part of the product lifecycle to influence the new designs, standards, and methods for scalable, highly available distributed systems

· own technically intricate issues that cross between DevOps, Databases, Networking, Code, Infrastructure and people; drive them to satisfactory completion.

• Provide recommendations and feedback in design reviews and review sessions.

We offer

Opportunity to work on bleeding-edge projects
Work with a highly motivated and dedicated team
Competitive salary
Flexible schedule
Benefits package - medical insurance, sports
Corporate social events
Professional development opportunities
Well-equipped office

About us

Grid Dynamics (NASDAQ: GDYN) is a leading provider of technology consulting, platform and product engineering, AI, and advanced analytics services. Fusing technical vision with business acumen, we solve the most pressing technical challenges and enable positive business outcomes for enterprise companies undergoing business transformation. A key differentiator for Grid Dynamics is our 8 years of experience and leadership in enterprise AI, supported by profound expertise and ongoing investment in data, analytics, cloud & DevOps, application modernization and customer experience. Founded in 2006, Grid Dynamics is headquartered in Silicon Valley with offices across the Americas, Europe, and India.

Apply to the position

Country of application*

Information on personal data processing

You cannot apply for a position without accepting “INFORMATION ON PERSONAL DATA PROCESSING”

Resume*

File

Invalid file size or format. DOC, DOCX, TXT, PDF (2 MB)

Social profile

First name*

Last name*

E-mail*

Phone

City of application*

Consent to the processing of personal data in future recruitment processes*

I hereby give my consent to the Grid Dynamics Group to process my personal data contained in the application documents for the purpose of using my application in future recruitment processes.

We are committed to maintaining a transparent and ethical workplace. To learn more about how we support open communication, please review our Whistleblower Policy.

Additional files

File

Invalid file size or format. DOC, DOCX, TXT, PDF (2 MB)

Type cover letter

Submitting

Applications for this job are no longer accepted. Please explore other open opportunities on our platform.

Thank you!

You applied for the position Site Reliability Engineer successfully. We will get back to you soon. Have a great day!

Something went wrong...

There are possible difficulties with connection or other issues. Please try to use another browser (it's recommended to use the latest version of Google Chrome browser). If the problem still persists, please send your application to cv@griddynamics.com

Retry

Something went wrong...

Please double-check the information filled in the form, and make sure to provide valid data.

Retry

Don’t see the right opportunity?

Grid Dynamics is an equal opportunity employer. We are committed to creating an inclusive environment for all employees during their employment and for all candidates during the application process.

All qualified applicants will receive consideration for employment without regard to, and will not be discriminated against based on, age, race, gender, color, religion, national origin, sexual orientation, gender identity, veteran status, disability or any other protected category. All employment is decided on the basis of qualifications, merit, and business need.

Grid Dynamics Privacy Policy and E-verify

Site Reliability Engineer

Apply to the position

Thank you!

Something went wrong...

Something went wrong...

Don’t see the right opportunity?

CONTACTS

SECTIONS

FOLLOW US