Featherless AI Logo

Featherless AI

Machine Learning Engineer — Inference Optimization

Posted 3 Days Ago
In-Office or Remote
Hiring Remotely in World Golf Village, FL
Mid level
In-Office or Remote
Hiring Remotely in World Golf Village, FL
Mid level
Optimize inference latency and throughput for large-scale ML models, collaborating on performance tuning, and building inference-serving systems.
The summary above was generated by AI
About the Role

We’re looking for a Machine Learning Engineer to own and push the limits of model inference performance at scale. You’ll work at the intersection of research and production—turning cutting-edge models into fast, reliable, and cost-efficient systems that serve real users.

This role is ideal for someone who enjoys deep technical work, profiling systems down to the kernel/GPU level, and translating research ideas into production-grade performance gains.

What You’ll Do
  • Optimize inference latency, throughput, and cost for large-scale ML models in production

  • Profile and bottleneck GPU/CPU inference pipelines (memory, kernels, batching, IO)

  • Implement and tune techniques such as:

    • Quantization (fp16, bf16, int8, fp8)

    • KV-cache optimization & reuse

    • Speculative decoding, batching, and streaming

    • Model pruning or architectural simplifications for inference

  • Collaborate with research engineers to productionize new model architectures

  • Build and maintain inference-serving systems (e.g. Triton, custom runtimes, or bespoke stacks)

  • Benchmark performance across hardware (NVIDIA / AMD GPUs, CPUs) and cloud setups

  • Improve system reliability, observability, and cost efficiency under real workloads

What We’re Looking For
  • Strong experience in ML inference optimization or high-performance ML systems

  • Solid understanding of deep learning internals (attention, memory layout, compute graphs)

  • Hands-on experience with PyTorch (or similar) and model deployment

  • Familiarity with GPU performance tuning (CUDA, ROCm, Triton, or kernel-level optimizations)

  • Experience scaling inference for real users (not just research benchmarks)

  • Comfortable working in fast-moving startup environments with ownership and ambiguity

Nice to Have
  • Experience with LLM or long-context model inference

  • Knowledge of inference frameworks (TensorRT, ONNX Runtime, vLLM, Triton)

  • Experience optimizing across different hardware vendors

  • Open-source contributions in ML systems or inference tooling

  • Background in distributed systems or low-latency services

Why Join Us
  • Real ownership over performance-critical systems

  • Direct impact on product reliability and unit economics

  • Close collaboration with research, infra, and product

  • Competitive compensation + meaningful equity at Series A

  • A team that cares about engineering quality, not hype

Top Skills

Cuda
Ml Inference Optimization
Onnx Runtime
PyTorch
Tensorrt
Triton

Similar Jobs

17 Minutes Ago
Remote or Hybrid
West Palm Beach, FL, USA
Senior level
Senior level
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
This role involves managing calendars, coordinating meetings, handling travel arrangements, expense reporting, and onboarding new hires and vendors, while ensuring effective communication and confidentiality.
Top Skills: BoxConcurOutlookPowerPointWordZoom
18 Minutes Ago
Remote or Hybrid
West Palm Beach, FL, USA
146K-256K Annually
Senior level
146K-256K Annually
Senior level
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
The Senior Program Manager will drive revenue growth through strategic insights, manage deal executions, and design scalable processes based on partnership analytics and spending data.
Top Skills: Crm Analytics
25 Minutes Ago
Remote or Hybrid
USA
67K-93K Annually
Mid level
67K-93K Annually
Mid level
Edtech • Information Technology • Software
The Assessment Solution Strategist designs and manages assessments to ensure they meet educational goals, quality standards, and learner needs, using data analysis and project management skills.
Top Skills: AsanaConfluenceExcelLmsRSaas Software (E.G.Smartsheet)SpssZendesk

What you need to know about the Seattle Tech Scene

Home to tech titans like Microsoft and Amazon, Seattle punches far above its weight in innovation. But its surrounding mountains, sprinkled with world-famous hiking trails and climbing routes, make the city a destination for outdoorsy types as well. Established as a logging town before shifting to shipbuilding and logistics, the Emerald City is now known for its contributions to aerospace, software, biotech and cloud computing. And its status as a thriving tech ecosystem is attracting out-of-town companies looking to establish new tech and engineering hubs.

Key Facts About Seattle Tech

  • Number of Tech Workers: 287,000; 13% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Amazon, Microsoft, Meta, Google
  • Key Industries: Artificial intelligence, cloud computing, software, biotechnology, game development
  • Funding Landscape: $3.1 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Madrona, Fuse, Tola, Maveron
  • Research Centers and Universities: University of Washington, Seattle University, Seattle Pacific University, Allen Institute for Brain Science, Bill & Melinda Gates Foundation, Seattle Children’s Research Institute

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account