Senior Solution Engineer - AI and ML Storage Architecture

at NVIDIA

Full Time

As a Senior Solution Engineer specializing in AI/ML Storage Architecture, you will be an integral part of our dynamic team, contributing to the design, construction, and maintenance of innovative storage solutions tailored for Artificial Intelligence and Machine Learning workloads. This role spans various domains, including software and systems engineering practices, storage, data management, and services. Their responsibilities encompass ensuring reliable storage solutions, managing data efficiently, and providing related services to support the overall stability and performance of the production systems.

Solution Engineer specializing in AI/ML at NVIDIA, your role involves ensuring the reliability and uptime of both our internal and external GPU cloud services, aligning with our commitments to users. Simultaneously, you empower developers to implement system changes through meticulous preparation and planning, with a keen focus on aspects like capacity, latency, and performance. This position embodies a specific attitude and a suite of engineering strategies aimed at enhancing the efficiency of production systems and implementing optimizations. A significant portion of our software development efforts concentrates on automating tasks, fine-tuning performance, and enhancing overall production system efficiency. Given the comprehensive responsibility for understanding how our systems interconnect, you will use a diverse range of tools and approaches to address a wide array of challenges. This role offers engaging and dynamic day-to-day work, emphasizing continual enhancement and ensuring the success of our AI/ML solutions. Solution Engineer's culture of diversity, intellectual curiosity, problem-solving, and openness is important to its success. Our organization brings together people with a wide variety of backgrounds, experiences, and perspectives. We encourage them to collaborate, think big, and take risks in a blame-free environment. We promote self-direction to work on meaningful projects while striving to build an environment that provides the support and mentorship needed to learn and grow.

What You Will Be Doing:

Solution Design: Collaborating with multi-functional teams to design and implement storage architectures optimized for AI/ML workloads, ensuring scalability, performance, and reliability.
Technology Expertise: Apply in-depth knowledge of storage technologies, encompassing Lustre and Cloud storage, to devise and implement solutions tailored to the distinctive requirements of AI/ML training and inference workloads. Stay current with industry trends and advancements in storage technologies to consistently elevate the company's capabilities.
Cloud Infrastructure Integration: Apply proficiency in GCP, AWS, and Azure to incorporate storage solutions emphasizing on efficiency, reliability, and cost-effectiveness.
Demonstrate practical experience in constructing and deploying storage solutions, assuming responsibility for the entire process and troubleshooting as needed.
Technical Consultation: Providing technical expertise and consultation to customers and internal teams on storage solutions, aligning with AI/ML standard methodologies.
Performance Optimization: Analyzing and optimizing storage systems for AI/ML applications to meet performance requirements and enhance overall system efficiency.
Integration: Integrating storage solutions seamlessly with AI/ML frameworks, ensuring compatibility and improving the utilization of storage resources.
Collaboration: Working closely with data scientists, engineers, and stakeholders to understand AI/ML storage requirements and proposing tailored solutions.
Documentation: Creating comprehensive technical documentation for AI/ML storage architectures, guidelines, and best practices.
Emerging Technologies: Staying abreast of industry trends and emerging technologies in AI/ML and storage to drive innovation and continuous improvement.

What We Need To See:

Proven experience in designing and implementing storage architectures for AI/ML workloads, with a focus on scalability and performance.
Strong technical expertise in storage technologies, AI/ML frameworks, and their integration. Proficiency in programming languages commonly used in AI/ML, such as Python, is desirable.
Excellent communication and interpersonal skills to effectively convey complex technical concepts to both technical and non-technical collaborators.
Confirmed ability to analyze and solve complex technical challenges related to AI/ML storage architectures.
A collaborative approach with the ability to work effectively in multi-functional teams and engage with clients to understand their specific AI/ML storage requirements.
Prior hands-on coding experience for storage systems
Master's degree in Computer Science, Engineering, or a related field or equivalent experience.
3+ year of relevant experience

Ways to stand out from the crowd:

Certifications in relevant technologies, such as NVIDIA Deep Learning Institute (DLI) certifications. Previous experience working with cloud-based AI/ML services and storage solutions.
Familiarity with container orchestration platforms like Kubernetes. Flexible in adapting to different working styles.

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and dedicated people on the planet working for us. If you're creative and autonomous, we want to hear from you!

NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern deep learning — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, we are increasingly known as “the AI computing company.” We're looking to grow our company and establish teams with the most thoughtful people in the world.

The base salary range is 148,000 USD - 276,000 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Salary

$148,000 - $276,000

Yearly based

Location

US, CA, Santa Clara

Engineer Machine Learning

Job Overview

Job Posted:

10 months ago

Job Expires:

Job Type

Full Time

Salary

$148,000 - $276,000

Location

Share This Job:

AI Jobs

Companies

Support

Job Details

Salary

$148,000 - $276,000

Location

Share This Job:

Related Jobs

AI Jobs

Companies

Support