Join the team building software which will be used by the entire world of AI. Work with high class software engineers to implement a large scale toolset that tests deep learning models and frameworks on the most powerful computers. Ability to work in a multifaceted, fast-paced environment is required as well as strong social skills.
In this role you will be interacting with internal partners, users, and members of the open source community to implement solutions for building, testing, integrating, and releasing of NVIDIA AI Services and Deep Learning Frameworks on the most powerful, enterprise-grade GPU clusters capable of hundreds of Peta FLOPS. This role spans multiple products such as PyTorch, TensorFlow, JAX, PaddlePaddle. You will work with internal engineering teams to deploy and operationalize AI models and services at scale by driving adoption for end-to-end Machine Learning and Deep Learning solutions in the cloud and on prem.
We are seeking passionate and hardworking python developers to help us scale our AI and deep learning services, platforms, models and internal tools. You will be responsible for implementing and maintaining tools, and infrastructure that enable our teams to productize NVIDIA SW stack: from DL Frameworks (PyTorch, TF, JAX, PaddlePaddle), DL models and AI services.
Are you ready for this challenge?
automating and optimizing testing of Deep Learning models and AI Services from different data domains with focus on inference
developing shared utilities for setting up systems, running tests, recording results and visualization on dashboards.
configuring, maintaining, and building upon deployments of industry-standard tools (e.g. GitLab, Docker, Bash)
Lead best-practices for building, testing, and releasing software including AI Services and DL models
Identifying infrastructure needs and translating them into action
Building tools for automatic content generation mechanisms that saves dozens of engineering hours
BSc or MS degree in Computer Science, Computer Architecture or related technical field
3+ years of work experience in software development
Excellent Python programming skills, Great coding skills and a deep understanding of OOP concepts.
Familiarity with DevOps concepts such as CI/CD, Docker, Jenkins and automation tools.
Experience in building both front-end (e.g. JS, React, Vue, Dash, Streamlit) and back-end services (e.g Flask, FastAPI, Django) services
Understanding of Deep Learning allowing benchmarking on DL models
Willing to take action and strong analytical skills.
Strong time-management and organization skills for coordinating multiple initiatives, priorities and implementations of new technology and products into very complex projects.
Good communication and documentation habits
Solid understanding of Linux environments
Experience with containerization technologies such as Docker
Experience in building monitoring or dashboarding solutions to support CI/CD pipelines.
Hands-on in configuring complex CI pipelines
Experience with HPC based compute clusters and scheduling solutions like Slurm
NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most brilliant and forward-thinking people in the world working for us. If you're creative and autonomous, we want to hear from you!
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
#deeplearning