We are seeking an experienced GPU Engineer with a strong background in Python and large-scale model training. In this role you will design and implement improvements to our large scale training infrastructure and directly help make technical decisions to optimise our models' training performance and efficiency.
Ideal Experience
Strong engineering skills with fluency in Python and PyTorch or other acceleration libraries.
Experience writing and debugging low-level GPU code (CUDA, Triton) and debugging hardware errors.
Experienced in scaling up GPU jobs via large-scale compute clusters using Slurm or Kubernetes.
Preferred: knowledge of advanced filesystems, particularly Ceph, and their integration and optimization in large systems.
Preferred: proficient in implementing robust monitoring systems for performance tracking and anomaly detection.
Why Reka?
Reka's mission is to build useful multimodal artificial intelligence and use it to empower organisations and businesses. We are a globally distributed foundation model startup, headquartered in the San Francisco Bay Area, California. Embracing a remote-first approach, our team brings together top talent from around the world. Our founding team, along with many of our team members, has contributed to many of the breakthroughs in AI over the past decade.
Opportunity to work with a collaborative mission-driven team on cutting-edge AI technology.
Open and inclusive work environment.
4 weeks paid leave.
Visa support (such as H1B and OPT transfer for US Employees).
Healthcare benefits, including vision and dental.