01 Zakres zadań
Key Responsibilities:
- Own and operate staging and production environments in Microsoft Azure
- Manage and support application deployments on OpenShift (on-prem and Azure)
- Support and optimize CI/CD pipelines and enable GitOps practices (e.g., ArgoCD)
- Ensure system reliability through SLIs, SLOs, and continuous improvement of service health
- Design, implement, and maintain observability solutions (monitoring, logging, alerting) using tools such as Prometheus, Grafana, Azure Monitor, and ELK/EFK
- Troubleshoot issues across infrastructure, platform (Azure/OpenShift), applications, and deployments
- Lead incident management, including root cause analysis (RCA), MTTR reduction, and prevention of recurring issues
- Build and maintain Infrastructure as Code using Terraform and drive automation to reduce operational toil
- Improve deployment reliability, release processes, and overall system resilience
- Collaborate with development teams to embed reliability into design, delivery, and operational practices
- Maintain and improve operational documentation, including runbooks and procedures
- Ensure performance, scalability, cost efficiency, security, and compliance of cloud infrastructure
- Advocate for SRE best practices and a DevOps culture across engineering teams
