Senior Site Reliability Engineer
Senior Site Reliability Engineer to ensure Duolingo’s distributed systems are built and maintained with extraordinary quality. Requires 3+ years of experience in SRE/DevOps.
As a Senior Site Reliability Engineer, you will work closely with both product and platform engineering teams to ensure Duolingo’s sophisticated distributed systems and products are built and maintained with extraordinary quality, and operated in measurable and scalable ways.
You will...
- Collaborate with internal teams to identify sources of instability in distributed systems and drive operational excellence
- Own core infrastructure (i.e understand, diagnose, and debug these systems in production)
- Provide system design consulting, develop software platforms/frameworks, and conduct launch reviews and root cause analysis
- Maintain and document sustainable postmortem/incident response practices
- Advocate for and implement changes that improve reliability, scalability, and velocity
- Reduce the burden of toil with iterative development of tooling and automation
- Collaborate with engineering teams to release new features and become an authority on our services
You have...
- 3+ years of experience within site reliability engineering/DevOps of a product with millions of users
- Experience identifying and solving issues in large-scale distributed systems
- Experience with Java, Kotlin, Python or Go
- Proficiency in networking protocols, such as TCP/IP, HTTP, SSL, DNS, etc
- An understanding of containerization toolsets and container orchestration technologies (Docker, Mesos, Kubernetes, Nomad, etc)
Exceptional candidates will have...
- Experience in improving automation and tooling to reduce service maintenance toil
- Proven experience driving improvements to incident response processes
- Experience assessing reliability and troubleshooting issues in MySQL and/or PostgreSQL databases
Salary Range:
$177,700—$300,000 USD