01 Zakres zadań

Develop a standardized observability ecosystem and implement a conscious telemetry model focusing on structured events, distributed tracing, and intelligent sampling strategies.
Act as a strategic partner to product engineering teams, providing the platform, standards, and data for service reliability ownership; use error budgets and alerting to balance feature velocity with stability.
Enhance detection capabilities with early-warning systems and AI/ML for automated anomaly detection and intelligent data analysis to strengthen system resilience.
Build internal automation and tooling that streamlines SRE workflows, automates routine tasks, and enhances efficiency across the technology stack.
Participate in an on-call rotation for incident management, ensuring rapid resolution, effective communication, and post-incident analysis for continuous improvement.

02 Wymagania

12 must-have · 2 języki

Must-have

Prometheus

Zaawansowany

Grafana

Zaawansowany

Elastic Stack

Zaawansowany

Ansible

Zaawansowany

Kubernetes

Zaawansowany

Python

Zaawansowany

Azure Kubernetes Service

Zaawansowany

ELK Stack

Zaawansowany

Tempo

Zaawansowany

Thanos

Zaawansowany

Jaeger

Zaawansowany

Incident Management

Zaawansowany

Wymagane języki

Polski

Ekspert

Angielski

Zaawansowany

03 Profil kandydata

Key requirements:

At least 5 years of professional experience in SRE, Infrastructure, or DevOps roles managing high-scale, distributed environments.
Advanced programming skills in Python, focusing on scalable automation, internal tooling, and robust scripts.
Hands-on expertise in managing production-grade Kubernetes environments, configuration management with Ansible, and designing resilient infrastructure architectures within Azure Kubernetes Service and on-prem environments.
Deep proficiency in building standardized telemetry ecosystems with self-hosted open-source tools: Prometheus, Grafana, ELK Stack, Tempo, Thanos, Jaeger, and similar.
Ability to drive incident management, conduct post-incident analysis, and foster a culture of reliability and shared ownership.
Ability to leverage AI/ML techniques for SRE tasks such as AIOps, automated anomaly detection, log analysis, and optimizing reliability workflows.

Nice to have:

Experience with commercial observability and APM solutions (e.g., Datadog, Splunk, New Relic) or chaos engineering frameworks.

04 Benefity

Pakiet medyczny

Ubezpieczenie

Budżet szkoleniowy

Wyjazdy na konferencje

Klasy językowe

05 O firmie

XTB

1000 - 5000 · Warszawa

XTB to działający od 2005 roku polski dom maklerski o zasięgu globalnym oferujący dostęp do tysięcy instrumentów finansowych takich jak CFD na waluty, surowce, indeksy giełdowe czy kryptowaluty, a także akcje i ETF-y notowane na najpopularniejszych giełdach świata. Posiadamy licencję na prowadzenie działalności maklerskiej wydaną przez Komisję Nadzoru Finansowego oraz jesteśmy jednym z największych na świecie brokerów FX i CFD notowanych na giełdzie. Wyróżnia nas innowacyjna i wielokrotnie nagradzana platforma xStation, szybka i profesjonalna obsługa klienta oraz bogaty pakiet edukacyjny z kursami online dla inwestorów na każdym etapie zaawansowania.

Zobacz ogłoszenia Strona www