NVIDIA Logo

NVIDIA

Principal Cloud Engineer, HPC

Job Posted 15 Days Ago Posted 15 Days Ago
In-Office or Remote
3 Locations
272K-426K
Expert/Leader
In-Office or Remote
3 Locations
272K-426K
Expert/Leader
Design and architect AI-oriented compute services, build distributed infrastructure for model training, coordinating with multifunctional teams.
The summary above was generated by AI

NVIDIA is on the journey to build the best cloud offering for AI workloads and to bring its latest GPU technology to our clients as a set of managed services under the DGX Cloud umbrella. We want to be able to innovate on behalf of our clients and provide an easy no-hassle way of using the latest and greatest NVIDIA products through scalable managed self service APIs.

We are looking for a Principal HPC / Slurm engineer to drive the technical design and develop a new set of high performing cloud services for Artificial Intelligence and high performance computing. This is a unique opportunity to be a founding member of a team building at the intersection of a highly scalable fault tolerant cloud services and AI. We are looking for an engineer who has a deep understanding of large scale distributed cloud services, multi-tenant architectures, serverless compute. HPC experience is a plus.

What you'll be doing:

  • Design and architect a set of new AI oriented compute services large training workloads

  • Build the distributed computing infrastructure and training services for creating large scale distributed model training

  • Plan and coordinate across multi-functional teams, partners and vendors for execution of infrastructure build-outs

  • Work with engineering teams across all of NVIDIA to ensure their requirements are correctly translated into infrastructure needs

What we need to see:

  • Solid technical foundation in distributed computing and storage, including substantial experience with all of the following: server systems, storage, I/O, networking, and system software

  • Bachelors degree or equivalent experience

  • 12+ years of system software engineering experience on large-scale production systems

  • 12+ years of architecting high performance computing infrastructure at scale

  • Proven experience in high performance computing, Deep Learning, and/or GPU accelerated computing domains

  • Ability to understand and communicate complex designs, distributed infrastructure, and requirements to peers, customers, and vendors

  • General shared storage knowledge such as NFS, LustreFS, GlusterFS, etc.

  • Familiarity with system level architecture, such as interconnects, memory hierarchy, interrupts, and memory-mapped IO.

Ways to stand out from the crowd:

  • Large-scale distributed system, HPC, ML and Training experience with Slurm and Kubernetes

  • Deep knowledge of both software and hardware knowledge in HPC and ML infrastructure

NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars. NVIDIA is looking for great people like you to help us accelerate the next wave of artificial intelligence.

The base salary range is 272,000 USD - 425,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Top Skills

AI
Cloud Services
Distributed Computing
Glusterfs
Hpc
Kubernetes
Lustrefs
Nfs
Slurm
HQ

NVIDIA Seattle, Washington, USA Office

4545 Roosevelt Way NE 6th Floor, Seattle, Washington, United States, 98105

Similar Jobs

2 Hours Ago
Remote
USA
186K-219K Annually
Senior level
186K-219K Annually
Senior level
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
As a Senior Machine Learning Platform Engineer, you'll mentor junior engineers, optimize ML pipelines, and enhance system performance for machine learning models.
Top Skills: AirflowDatabricksDynamoDBGoPythonRaySnowflakeSparkTecton
2 Hours Ago
Easy Apply
Remote
3 Locations
Easy Apply
98K-210K Annually
Mid level
98K-210K Annually
Mid level
Cloud • Security • Software • Cybersecurity • Automation
As a Developer Advocate, you will engage with the community, create technical content, support developers, and advocate for their needs while collaborating with cross-functional teams.
Top Skills: AIDevsecopsGitlabModern Development WorkflowsOpen SourceSecurity Tools
2 Hours Ago
Remote
Hybrid
United States
91K-169K Annually
Mid level
91K-169K Annually
Mid level
Artificial Intelligence • Cloud • Sales • Security • Software • Cybersecurity • Data Privacy
Develop and implement features for Non-employee Risk Management, write clean code, collaborate in agile processes, and document solutions.
Top Skills: AngularAWSDockerEc2EcrFargateGoMongoDBMySQLOauthOidcRdsReactRuby On RailsS3SAML

What you need to know about the Seattle Tech Scene

Home to tech titans like Microsoft and Amazon, Seattle punches far above its weight in innovation. But its surrounding mountains, sprinkled with world-famous hiking trails and climbing routes, make the city a destination for outdoorsy types as well. Established as a logging town before shifting to shipbuilding and logistics, the Emerald City is now known for its contributions to aerospace, software, biotech and cloud computing. And its status as a thriving tech ecosystem is attracting out-of-town companies looking to establish new tech and engineering hubs.

Key Facts About Seattle Tech

  • Number of Tech Workers: 287,000; 13% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Amazon, Microsoft, Meta, Google
  • Key Industries: Artificial intelligence, cloud computing, software, biotechnology, game development
  • Funding Landscape: $3.1 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Madrona, Fuse, Tola, Maveron
  • Research Centers and Universities: University of Washington, Seattle University, Seattle Pacific University, Allen Institute for Brain Science, Bill & Melinda Gates Foundation, Seattle Children’s Research Institute
By clicking Apply you agree to share your profile information with the hiring company.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account