Senior Platform Engineer – (Focus on SRE & Observability)
Publié il y a 2 semaines
As a Senior Platform Engineer, you will ensure the reliability, scalability, and security of production systems. You will manage and optimize AWS infrastructure, automate CI/CD and Terraform workflows, implement observability solutions, and contribute to incident response and operational excellence.
Key Responsibilities
- Improve overall production readiness
- Define and implement the observability strategy (monitoring, alerting, dashboards)
- Drive reliability enhancements and actively support incident response
- Support and optimize AWS infrastructure
- Harden and secure CI/CD pipelines
- Improve Terraform governance and automation processes
- Contribute to identity and security integrations (Auth0)
Technical Requirements
Cloud & Infrastructure :
- Strong AWS expertise (EKS/ECS/EC2, ALB/NLB, IAM, VPC, CloudWatch)
- Infrastructure as Code using Terraform (state management, modular design, remote backends, CI validation, best practices)
- CI/CD pipelines (GitHub Actions preferred): safe deployments, rollback strategies, automation
Observability & Reliability
- Metrics, logs, and traces (CloudWatch, OpenTelemetry, Signoz, Grafana)
- Alerting strategies, SLO/SLI definition, error budgets
- Designing and implementing production-grade monitoring from scratch
SRE & Operational Excellence :
- Incident management and structured root cause analysis (RCA)
- Reliability, scalability, and performance tuning
- Production hardening and high-availability design
Automation, Identity & Security :
- Python for operational tooling and automation
- Auth0 knowledge (tenant management, RBAC, integrations, security best practices)
- Security fundamentals (least-privilege IAM, secrets management, audit logging, compliance awareness)
Required Experience
- + 5 years of experience
- Hands-on support of production systems
- Active participation in incident response and postmortems
- Experience building or improving observability frameworks
- Exposure to cloud-native architectures
- Close collaboration with software engineers to improve deployments and system reliability
- Experience with high-availability, customer-facing systems is strongly preferred
