System Engineer (Token Factory)
Title: System Engineer (Token Factory) Location: Amsterdam, Netherlands; Germany; Israel; Prague, Czech Republic; Remote - Europe; Remote - United States; United Kingdom
About the role:
Token Factory is a part of Nebius Cloud, one of the world’s largest GPU clouds, running tens of thousands of GPUs. We are building an inference platform that makes every kind of foundation model — text, vision, audio, and emerging multimodal architectures — fast, reliable, and effortless to deploy at massive scale.
Responsibilities:
- Develop and optimize low-level kernels and runtime components for AI inference
- Improve performance of inference engines GPU platforms
- Profile and debug system-level and hardware-level performance issues
- Integrate support for new hardware architectures (Hopper, Blackwell, Rubin)
- Collaborate with ML and backend teams to optimize end-to-end execution
Required Qualifications:
- Strong proficiency in C++, OR expertise in GPU programming with a focus on low-level high-performance coding and memory management
- Experience in GPU programming or systems-level software development, e.g. operating system internals, kernel modules, or device drivers
- Hands-on experience with profiling and debugging tools to identify performance issues on both CPUs and GPUs, and the ability to optimize code based on those findings.
- Solid understanding of CPU/GPU architecture and memory hierarchy
Preferred Qualifications:
- Experience with GPU computing programming: CUDA, ROCm, CUTLASS, Cute, ThunderKittens, Triton, Pallas, Mosaic GPU
- Familiarity with ML inference runtimes (e.g. TensorRT, TVM)
- Knowledge of Linux internals, drivers, or compiler toolchains
- Experience with tools like perf, VTune, Nsight, or ROCm profiler
- Familiarity with popular inference engines (e.g. such as vLLM, sglang, TGI)
We conduct coding interviews as part of the process.