Distributed Systems Engineer, Security
Build large-scale distributed systems to secure OpenAI's fleet. Requires experience with security, software, and infrastructure.
Distributed Systems Engineer, Security
OpenAI is pushing artificial intelligence to an unprecedented scale. We have a huge compute footprint and run some of the biggest GPU clusters in the world. As our scale has grown, so has our threat landscape – while advanced AI can benefit the world, in the wrong hands, it can also be used maliciously.
Your job will be to protect our work from those who seek to misuse it.
As a security-focused Distributed Systems engineer, you will build large-scale production-grade distributed systems that secure OpenAIs massive fleet. This requires both providing easy to use, introspectable systems that can promote a fast debugging and development cycle, while also enabling that experience to scale to our newest supercomputers maintaining stability and performance throughout.
You will work alongside a diverse team of engineers, developers, and security advisers to design, architect, and drive security improvements across OpenAI. We are a small company, and intend to stay small: as an early member of the security engineering team, the decisions you make today will have a significant impact on the organization today and into the future.
We’re looking for an engineer with a broad knowledge of security, software, and infrastructure. This is a senior role, and we’re looking for someone who has experience with a wide variety of real-world issues.
In this role, you will:
Work across our Go, Python, and Rust stacks
Build infrastructure and primitives to secure our bare metal and cloud infrastructure, using modern approaches and technologies, such as trusted computing.
Profile and optimize and help design for scale, while keeping security as transparent as possible to engineers and researchers.
You might thrive in this role if you:
Have designed, implemented, and operated large-scale distributed systems.
Love figuring out how systems work and continuously come up with ideas for how to make them faster while minimizing complexity and maintenance burden.
Have strong software engineering skills and are proficient in Python, Go, and/or Rust.
Have a good understanding of tradeoffs between security, reliability, usability, and performance.