Senior Site Reliability Engineer
Senior Site Reliability Engineer needed to design, develop, and ship foundational services to enable Klaviyo engineering to move faster with confidence.
Site Reliability Engineering (SRE) is what you get when you treat system operations as a software engineering problem. The mission of the Site Reliability Engineering team is to provide services, tooling, and guidance to Klaviyo's product engineers to make them more productive and ensure their services are sufficiently reliable, scalable, and secure.
The SRE team builds foundational backend services as well as tooling and automation to allow product teams to release and scale their software reliably and predictably. SREs are team players who work collaboratively amongst themselves and with engineers from product teams to build the platform Klaviyo relies on to power its products.
As a Senior Site Reliability Engineer you will own multiple foundational Klaviyo services and make a big impact on the productivity of our product engineering teams.
How you will make a difference:
- Ship foundational services to enable Klaviyo engineering to move faster with confidence
- Design and develop systems and processes that enable highly available & scalable systems
- Design, build and deliver software to dramatically improve the availability, scalability, latency, and efficiency of Klaviyo’s services
- Achieve break-throughs in systems throughput by identifying and eliminating bottlenecks
- Leverage technology such as Python, Go, Bash, Django, AWS, Kubernetes, Terraform, MySQL, Apache Pulsar, Redis, and Clickhouse to advance Klaviyo’s platform
- Champion best practices by actively collaborating with other teams in a culture that values technical design review
- Contribute to the company as a subject matter expert in multiple areas, constantly pushing yourself to be a better engineer and to level up all of your peers within your team and within Klaviyo.
- Mentor and pair with other Klaviyo engineers to build better software by focusing on performance, self-healing system, configuration as code; defensive programming, application security, etc.
- Participate in periodic on call duties with a focus on solving issues when they are discovered, preventing recurrences and minimizing alert fatigue
- Work hand-in-hand with product-facing engineers to ship impactful code
- Perform quantitative analysis to understand and scale Klaviyo systems and manage the cross-functional effort to resolve scalability issues
- Produce and advocate for preventative, upstream solutions with internal stakeholders and external vendors and dependencies
- Confidently make informed, data-driven decisions in a fast paced environment with competing priorities
- Evangelize Site Reliability best practices across the engineering organization and community
Who You Are:
- BA or BS Degree in Computer Science, related field, or equivalent experience
- 5+ years of responsibility operating & scaling complex distributed systems
- Experience developing applications in Python, Ruby, Go, etc.
- Experience working on an engineering team building software
- Fundamental understanding of Linux (we run Ubuntu) and all layers of the networking stack; you should be confident administering and debugging production Linux systems
- Ability to handle yourself and complex systems in outage situations and to drive failures to root cause analysis and prevention of future issues
Massachusetts Applicants:
It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment or continued employment. An employer who violates this law shall be subject to criminal penalties and civil liability.
The pay range for this role is listed below. Sales roles are also eligible for variable compensation and hourly non-exempt roles are eligible for overtime in accordance with applicable law. This role is eligible for benefits, including: medical, dental and vision coverage, health savings accounts, flexible spending accounts, 401(k), flexible paid time off and company-paid holidays and a culture of learning that includes a learning allowance and access to a professional coaching service for all employees.
Base Pay Range For US Locations:
$152,000—$228,000 USD