GenAI Staff Machine Learning Engineer, Performance Optimization
Explore and analyze performance bottlenecks in ML training and inference, design and implement libraries, and build tools for performance profiling.
You will:
- Explore and analyze performance bottlenecks in ML training and inference
- Design, implement and benchmark libraries and methods to overcome aforementioned bottlenecks
- Build tools for performance profiling, analysis, and estimation for ML training and inference
- Balance the tradeoff between performance and usability for our customers
- Facilitate our community through documentation, talks, tutorials, and collaborations
- Collaborate with external researchers and leading AI companies on various efficiency methods
We look for:
- Hands on experience the internals of deep learning frameworks (e.g. PyTorch, TensorFlow) and deep learning models
- Experience with high-performance linear algebra libraries such as cuDNN, CUTLASS, Eigen, MKL, etc.
- General experience with the training and deployment of ML models
- Experience with compiler technologies relevant to machine learning
- Experience with distributed systems development or distributed ML workloads
- Hands on experience with writing CUDA code and knowledge of GPU internals (Preferred)
- Publications in top tier ML or System Conferences such as MLSys, ICML, ICLR, KDD, NeurIPS (Preferred)