Snowflake Logo

Snowflake

Senior Software Engineer — LLM Post-Training Platform

Posted 4 Days Ago
Be an Early Applicant
In-Office
Bellevue, WA, USA
200K-288K Annually
Senior level
In-Office
Bellevue, WA, USA
200K-288K Annually
Senior level
Build and scale an LLM post-training platform: design public training APIs and SDKs, control plane and GPU data plane, implement multi-tenant scheduling and capacity-aware routing, optimize end-to-end performance and throughput, and productionize research components for reliable enterprise-scale training and inference.
The summary above was generated by AI

At Snowflake, we are powering the era of the agentic enterprise. To usher in this new era, we seek AI-native thinkers across every function who are energized by the opportunity to reinvent how they work. You don’t just use tools; you possess an innate curiosity, treating AI as a high-trust collaborator that is core to how you solve problems and accelerate your impact. We look for low-ego individuals who thrive in dynamic and fast-moving environments and move with an experimental mindset — who rapidly test emerging capabilities to discover simpler, more powerful ways to deliver results. At Snowflake, your role isn't just to execute a function, but to help redefine the future of how work gets done.

Senior Software Engineer — LLM Post-Training Platform

The Snowflake ML Platform team's mission is to let customers run their most demanding ML/AI workloads inside Snowflake. Cortex Training is our LLM post-training platform: it turns scarce, expensive GPU capacity into a simple, composable service, so customers can adapt open-weight foundation models to their own business problems while we handle the hard distributed-systems parts, including scheduling, orchestration, multi-node training and inference, fault tolerance, and throughput.

The platform already runs post-training at scale. Under the hood, it decouples GPU computation from the training loop and exposes it as primitive APIs that compose into everything from SFT to full RL workflows. You'll work alongside a team that ships fast & sweats reliability and the researchers behind DeepSpeed. We're looking for an engineer who thrives in the ML infrastructure layer and brings a solid understanding of LLMs and post-training to help us scale and grow it.

YOU WILL:
  • Design and build across the full stack — from the public training APIs and SDK through the control plane to the GPU data plane.

  • Scale the distributed systems that make GPU compute serverless — multi-tenant scheduling, placement, and capacity-aware routing across regional GPU pools, with fault tolerance built in.

  • Drive end-to-end performance at scale — keep the training, inference, and RL loops fast and the data plane responsive under heavy concurrent load, with GPUs kept saturated.

  • Productionize research building blocks — partner with Snowflake Research to turn state-of-the-art training and inference techniques into reliable, composable components customers can run at enterprise scale.

QUALIFICATIONS:
  • 5+ years building and shipping production ML systems

  • Strong distributed systems and infrastructure foundation — designing scalable, fault-tolerant services and operating them on Kubernetes in production.

  • Familiarity with GPU and LLM infrastructure — e.g., PyTorch, DeepSpeed/FSDP, Ray, CUDA/NCCL, vLLM; able to debug across the data, infrastructure, and GPU layers.

  • Demonstrated ability to harden complex systems for reliability, throughput, and cost efficiency.

  • BS in Computer Science or a related field (MS/PhD a plus).

  • (Bonus) Hands-on LLM post-training / modeling experience — the strongest candidates pair deep infra skills with real post-training intuition.

Snowflake is growing fast, and we’re scaling our team to help enable and accelerate our growth. We are looking for people who share our values, challenge ordinary thinking, and push the pace of innovation while building a future for themselves and Snowflake.

How do you want to make your impact?

For jobs located in the United States, please visit the job posting on the Snowflake Careers Site for salary and benefits information: careers.snowflake.com

Snowflake Bellevue, Washington, USA Office

In the heart of Silicon Valley, you'll find our 4-story, 2-tower San Mateo hub, which actually emerged from the very spot Snowflake started in 2012 – it all began in one of our founder's humble San Mateo apartments.

Similar Jobs

An Hour Ago
In-Office
Bellevue, WA, USA
120K-160K Annually
Mid level
120K-160K Annually
Mid level
Cloud • Information Technology • Machine Learning
Design, implement, and maintain infrastructure and tools to validate GPU performance at scale. Develop performance tests, automation workflows, and Kubernetes controllers/operators, extend open-source tooling for metrics and observability, troubleshoot production systems, and participate in on-call rotation.
Top Skills: Ai/Ml InfrastructureGoGpu Performance TestingHpcKubernetesKubernetes Custom ControllersKubernetes OperatorsPython
An Hour Ago
In-Office
Bellevue, WA, USA
182K-242K Annually
Senior level
182K-242K Annually
Senior level
Cloud • Information Technology • Machine Learning
Design and build foundational dimensional data models, enterprise metrics, and curated data products on a modern lakehouse. Partner with business domains to translate processes into scalable, high-quality datasets, optimize analytical workloads, and operate distributed processing systems to support analytics, BI, and AI.
Top Skills: AutomqClickhouseDeltaFlinkIcebergKafkaKubernetesPulsarPythonRustScalaSparkSQLStarrocksTrino
An Hour Ago
In-Office or Remote
Seattle, WA, USA
2M-180K Annually
Senior level
2M-180K Annually
Senior level
Agency • Big Data • Consumer Web • Marketing Tech
Lead and grow Engineering and DevOps as a hands-on VP: own roadmap and SDLC, set engineering standards, act as technical architect for APIs and database modernization, manage CI/CD and cloud infrastructure, drive security/SOC 2 readiness, recruit and mentor teams, and coordinate cross-functional delivery.
Top Skills: AWSCi/CdDockerGCPInfrastructure-As-CodeKubernetesNosql DatabasesPythonRackspace CloudRelational DatabasesRest ApiSoc 2

What you need to know about the Seattle Tech Scene

Home to tech titans like Microsoft and Amazon, Seattle punches far above its weight in innovation. But its surrounding mountains, sprinkled with world-famous hiking trails and climbing routes, make the city a destination for outdoorsy types as well. Established as a logging town before shifting to shipbuilding and logistics, the Emerald City is now known for its contributions to aerospace, software, biotech and cloud computing. And its status as a thriving tech ecosystem is attracting out-of-town companies looking to establish new tech and engineering hubs.

Key Facts About Seattle Tech

  • Number of Tech Workers: 287,000; 13% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Amazon, Microsoft, Meta, Google
  • Key Industries: Artificial intelligence, cloud computing, software, biotechnology, game development
  • Funding Landscape: $3.1 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Madrona, Fuse, Tola, Maveron
  • Research Centers and Universities: University of Washington, Seattle University, Seattle Pacific University, Allen Institute for Brain Science, Bill & Melinda Gates Foundation, Seattle Children’s Research Institute

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account