AI and ML Performance Engineer

Sorry, this job was removed at 08:18 p.m. (PST) on Wednesday, Jun 11, 2025

Be an Early Applicant

In-Office

Redmond, WA

In-Office

Redmond, WA

Similar Jobs

X Corp.

Site Reliability Engineer

7 Days Ago

In-Office

120K-297K Annually

Junior

120K-297K Annually

Junior

Social Media • Software

As a Site Reliability Engineer, you will manage large-scale HPC systems, ensure stability and performance, and automate deployment processes.

Top Skills: AnsibleBashC++ChefJavaKubernetesLinuxPuppetPythonScala

Chewy

Senior Product Designer

2 Hours Ago

Hybrid

Bellevue, WA, USA

149K-245K Annually

Senior level

149K-245K Annually

Senior level

eCommerce • Healthtech • Pet • Retail • Pharmaceutical

The Senior Product Designer will lead the design process and transform customer needs into innovative, user-friendly products, collaborating cross-functionally.

Top Skills: DesignInteraction DesignUser Experience

PwC

Data Scientist

18 Hours Ago

Remote or Hybrid

63K-140K Annually

Junior

63K-140K Annually

Junior

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI

As a Data Scientist at PwC, you'll apply advanced analytics for data-driven decision-making, collaborating on AI solutions and managing data processes with clients.

Top Skills: AWSAzureGCPKerasLangchainNltkNoSQLPandasPythonScikit-LearnSemantic KernelSQL

We are now looking for an AI/ML Performance Engineer! At NVIDIA, we are at the forefront of advancing the capabilities of artificial intelligence. We are seeking an ambitious and forward-thinking AI/ML Performance Engineer to contribute to the development of next-generation inference optimizations and deliver industry-leading performance. In this role, you will investigate and prototype scalable inference strategies—driving down per-token latency and maximizing system throughput by applying cross-stack optimizations that span algorithmic innovations (e.g., attention variants, speculative decoding, inference-time scaling), system-level techniques (e.g., model sharding, pipelining, communication overlap), and hardware-level enhancements.

As NVIDIA makes significant strides in AI datacenters, our team holds a central role in enhancing the efficiency of our exponentially growing inference deployment needs and establishing a data-driven approach to algorithmic improvements, hardware design, and system software development. We collaborate extensively with teams across deep learning research, framework development, compiler and systems engineering, and silicon architecture. Thriving in this high-impact, interdisciplinary environment demands not only technical proficiency but also a growth mindset and a pragmatic attitude—qualities that fuel our collective success in shaping the future of datacenter technology.

What You’ll Be Doing:

Develop high-fidelity performance models to prototype emerging algorithmic techniques in Generative AI to drive model-hardware co-design.
Design targeted optimizations for inference deployment to maximize Pareto frontier of Accuracy, Throughput and Interactivity.
Quantify performance benefit of targeted optimizations to prioritize features and guide future software and hardware roadmap.
Model end-to-end performance impact of emerging GenAI workflows - such as Agentic Pipelines, Inference-time compute scaling, etc. - to guide datacenter design and optimization.
This position requires you to keep up with the latest DL research and collaborate with diverse teams, including DL researchers, hardware architects, and software engineers.

What we need to see:

A minimum qualification of a Master's degree (or equivalent experience) in Computer Science, Electrical Engineering or related fields.
Strong background in computer architecture, roofline modeling, queuing theory and statistical performance analysis techniques.
Solid understanding of LLM internals (attention mechanisms, FFN structures), model parallelism and inference serving techniques.
3+ years of hands-on experience in system evaluation of AI/ML workloads or performance analysis, modeling and optimizations for AI.
Proficiency in Python (and optionally C++) for simulator design and data analysis.
Growth mindset and pragmatic “measure, iterate, deliver” approach.

Ways to Stand Out from the Crowd:

Comfortable defining metrics, designing experiments and visualizing large performance datasets to identify resource bottlenecks.
Proven track record of working in cross-functional teams, spanning algorithms, software and hardware architecture.
Ability to distill complex analyses into clear recommendations for both technical and non-technical stakeholders.
Experience with GPU computing (CUDA)
Experience with deep learning frameworks like PyTorch, TRT-LLM, VLLM, SGLang

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative, autonomous and love a challenge, we want to hear from you!

The base salary range is 148,000 USD - 287,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

4545 Roosevelt Way NE 6th Floor, Seattle, Washington, United States, 98105

What you need to know about the Seattle Tech Scene

Home to tech titans like Microsoft and Amazon, Seattle punches far above its weight in innovation. But its surrounding mountains, sprinkled with world-famous hiking trails and climbing routes, make the city a destination for outdoorsy types as well. Established as a logging town before shifting to shipbuilding and logistics, the Emerald City is now known for its contributions to aerospace, software, biotech and cloud computing. And its status as a thriving tech ecosystem is attracting out-of-town companies looking to establish new tech and engineering hubs.

Key Facts About Seattle Tech

Number of Tech Workers: 287,000; 13% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Amazon, Microsoft, Meta, Google
Key Industries: Artificial intelligence, cloud computing, software, biotechnology, game development
Funding Landscape: $3.1 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Madrona, Fuse, Tola, Maveron
Research Centers and Universities: University of Washington, Seattle University, Seattle Pacific University, Allen Institute for Brain Science, Bill & Melinda Gates Foundation, Seattle Children’s Research Institute

Apply Save

By clicking Apply you agree to share your profile information with the hiring company.

NVIDIA

AI and ML Performance Engineer

Similar Jobs

Site Reliability Engineer

Senior Product Designer

Data Scientist

NVIDIA Seattle, Washington, USA Office

What you need to know about the Seattle Tech Scene

Key Facts About Seattle Tech