Featherless AI

Machine Learning Engineer — Inference Optimization

Posted 3 Days Ago

In-Office or Remote

Hiring Remotely in World Golf Village, FL

Mid level

In-Office or Remote

Hiring Remotely in World Golf Village, FL

Mid level

Optimize inference latency and throughput for large-scale ML models, collaborating on performance tuning, and building inference-serving systems.

The summary above was generated by AI

About the Role

We’re looking for a Machine Learning Engineer to own and push the limits of model inference performance at scale. You’ll work at the intersection of research and production—turning cutting-edge models into fast, reliable, and cost-efficient systems that serve real users.

This role is ideal for someone who enjoys deep technical work, profiling systems down to the kernel/GPU level, and translating research ideas into production-grade performance gains.

What You’ll Do

Optimize inference latency, throughput, and cost for large-scale ML models in production
Profile and bottleneck GPU/CPU inference pipelines (memory, kernels, batching, IO)
Implement and tune techniques such as:
- Quantization (fp16, bf16, int8, fp8)
- KV-cache optimization & reuse
- Speculative decoding, batching, and streaming
- Model pruning or architectural simplifications for inference
Collaborate with research engineers to productionize new model architectures
Build and maintain inference-serving systems (e.g. Triton, custom runtimes, or bespoke stacks)
Benchmark performance across hardware (NVIDIA / AMD GPUs, CPUs) and cloud setups
Improve system reliability, observability, and cost efficiency under real workloads

What We’re Looking For

Strong experience in ML inference optimization or high-performance ML systems
Solid understanding of deep learning internals (attention, memory layout, compute graphs)
Hands-on experience with PyTorch (or similar) and model deployment
Familiarity with GPU performance tuning (CUDA, ROCm, Triton, or kernel-level optimizations)
Experience scaling inference for real users (not just research benchmarks)
Comfortable working in fast-moving startup environments with ownership and ambiguity

Nice to Have

Experience with LLM or long-context model inference
Knowledge of inference frameworks (TensorRT, ONNX Runtime, vLLM, Triton)
Experience optimizing across different hardware vendors
Open-source contributions in ML systems or inference tooling
Background in distributed systems or low-latency services

Why Join Us

Real ownership over performance-critical systems
Direct impact on product reliability and unit economics
Close collaboration with research, infra, and product
Competitive compensation + meaningful equity at Series A
A team that cares about engineering quality, not hype

Top Skills

Cuda

Ml Inference Optimization

Onnx Runtime

PyTorch

Tensorrt

Triton

Similar Jobs

ServiceNow

Administrative Assistant

17 Minutes Ago

Remote or Hybrid

West Palm Beach, FL, USA

Senior level

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation

This role involves managing calendars, coordinating meetings, handling travel arrangements, expense reporting, and onboarding new hires and vendors, while ensuring effective communication and confidentiality.

Top Skills: BoxConcurOutlookPowerPointWordZoom

ServiceNow

Senior Program Manager

18 Minutes Ago

Remote or Hybrid

West Palm Beach, FL, USA

146K-256K Annually

Senior level

146K-256K Annually

Senior level

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation

The Senior Program Manager will drive revenue growth through strategic insights, manage deal executions, and design scalable processes based on partnership analytics and spending data.

Top Skills: Crm Analytics

Pluralsight

Assessment Solution Strategist

25 Minutes Ago

Remote or Hybrid

USA

67K-93K Annually

Mid level

67K-93K Annually

Mid level

Edtech • Information Technology • Software

The Assessment Solution Strategist designs and manages assessments to ensure they meet educational goals, quality standards, and learner needs, using data analysis and project management skills.

Top Skills: AsanaConfluenceExcelLmsRSaas Software (E.G.Smartsheet)SpssZendesk

What you need to know about the Seattle Tech Scene

Home to tech titans like Microsoft and Amazon, Seattle punches far above its weight in innovation. But its surrounding mountains, sprinkled with world-famous hiking trails and climbing routes, make the city a destination for outdoorsy types as well. Established as a logging town before shifting to shipbuilding and logistics, the Emerald City is now known for its contributions to aerospace, software, biotech and cloud computing. And its status as a thriving tech ecosystem is attracting out-of-town companies looking to establish new tech and engineering hubs.

Key Facts About Seattle Tech

Number of Tech Workers: 287,000; 13% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Amazon, Microsoft, Meta, Google
Key Industries: Artificial intelligence, cloud computing, software, biotechnology, game development
Funding Landscape: $3.1 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Madrona, Fuse, Tola, Maveron
Research Centers and Universities: University of Washington, Seattle University, Seattle Pacific University, Allen Institute for Brain Science, Bill & Melinda Gates Foundation, Seattle Children’s Research Institute