Senior Software Engineer (Token Factory)
The role
This role is for Nebius AI R&D, a team focused on applied research and the development of AI-heavy products. Examples of applied research that we have recently published include:
- investigating how test-time guided search can be used to build more powerful agents;
- dramatically scaling task data collection to power reinforcement learning for SWE agents;
- maximizing efficiency of LLM training on agentic trajectories.
One example of an AI product that we are deeply involved in is Nebius Token Factory — an inference and fine-tuning platform for AI models.
This role will require expertise in distributed systems to build large-scale LLM training platform.
Your responsibilities will include:
- Designing and developing LLM training platform.
- Maintaining our ML infrastructure, ensuring optimal performance, scalability and reliability.
- Improving job scheduling strategies to minimize resource fragmentation.
We expect you to have:
- 5+ years of professional software development experience.
- Strong software engineering skills (we mostly use Python).
- Proficiency in contemporary software engineering approaches, including CI/CD, version control and unit testing.
- Experience with developing web services.
- A commitment to maintaining extreme rigor in all job-related activities.
Nice to have:
- Previous experience working with language models or other similar NLP technologies.
- A track record of building and delivering products (not necessarily ML-related) in a dynamic startup-like environment.
- Strong engineering skills, including experience in developing large distributed systems or high-load web services.
- Open-source projects that showcase your engineering prowess.