01 Zakres zadań
Engagement context
Takeover of production AI mobile coaching platform. Runtimes: Node.js/NestJS, Python/FastAPI. Datastores: MongoDB, Postgres, Redis. Infra: AWS (ECS/EKS, RDS, ElastiCache, S3, VPC, IAM). CI: GitHub Actions. Observability: Datadog. Push: OneSignal. Errors: Crashlytics. Deep links: Branch. Vendors: Auth0, ElevenLabs, OpenAI, Amplitude, Terra, Strava.
Role summary
Senior SRE/DevOps. Owner: CI→production, IaC, deploy automation, observability, on-call, cost control, secrets, security baselines. Phase 1: measure and document. Phase 2: operate and transfer ownership.
First 90 days
- Audit CI/CD (GitHub Actions): duration, flakiness, failure modes, secrets handling
- Audit AWS: ECS/EKS topology, IAM posture, VPC layout, RDS, ElastiCache, S3
- Audit Datadog: dashboards, tracked metrics, SLO/SLI gaps
- Audit incidents (12m): count, severity, MTTR, RCA patterns
- Vendor inventory: Auth0, OneSignal, ElevenLabs, OpenAI, Branch, Amplitude, Terra, Strava, Crashlytics — owners, billing, MFA, recovery plans
Ongoing
- Own CI/CD across services
- Own AWS infra (Terraform/Pulumi where suitable)
- Cost control (OpenAI token spend, AWS rightsizing)
- Security baselines: least-privilege IAM, secrets rotation, dependency scanning
- Build onboarding for second SRE/DevOps hire