Wizard AI Logo

Wizard AI

AI Applied Scientist

Reposted 6 Days Ago
Remote
Hiring Remotely in USA
225K-280K Annually
Senior level
Remote
Hiring Remotely in USA
225K-280K Annually
Senior level
The Applied Scientist will measure and improve the accuracy of Wizard's AI agent through metrics, experiments, and data analysis, partnering with ML and AI engineering teams.
The summary above was generated by AI
About Wizard

Wizard is the top-performing AI Shopping Agent, delivering the best products from across the web with unmatched accuracy, quality, and trust.

The Role

We’re looking for an Applied Scientist to own how we measure, understand, and improve the accuracy of our AI agent. This role sits at the intersection of applied ML, evaluation science, and product. You’ll define what “good” looks like for our agent, build the systems to measure it, and lead the science work to improve it, including fine-tuning the LLM judges that power our evaluation pipeline.

You’ll partner with ML Engineering and AI Engineering. What you will do is bring scientific rigor to the most important question at Wizard: is our agent getting better, and how do we know?

This is a foundational hire on our science team. Evaluation is the starting point, and the role is scoped to grow into broader applied science work as the surface area of the agent expands (recommendations, personalization, ranking, multimodal, conversational understanding).

What You’ll Do
  • Define and evolve accuracy metrics across the full shopping experience (retrieval, ranking, recommendations, outcomes)
  • Design and run experiments to measure improvements and regressions
  • Build and maintain evaluation datasets, benchmarks, and scoring frameworks
  • Improve the LLM judges that power our evaluation pipeline: prompting, calibration, and fine-tuning where it matters
  • Translate ambiguous product questions into clear, measurable hypotheses and analysis
  • Partner with ML Engineers to validate model changes and guide iteration
  • Identify failure modes and edge cases, and drive improvements through data
  • Make agent performance visible, trusted, and actionable across product and engineering
First 3 months
  • Go deep on the agent, the current eval pipeline, and the metrics we use today
  • Audit existing accuracy metrics and benchmarks; identify gaps, blind spots, and signals that aren’t trustworthy
  • Build relationships with ML, AI Engineering, and Product
  • Ship one quick win: a missing benchmark, an improved metric, or a fix to a misleading signal
  • Establish a baseline view of agent performance the team can rally around
Months 3 to 6
  • Own the evaluation framework: datasets, metrics, scoring, reporting, both offline and online
  • Drive measurable improvements to LLM judge quality (calibration, fine-tuning where appropriate)
  • Run experiments that influence at least one significant model or product change
  • Stand up automated evaluation the team trusts before and after every launch
  • Build dashboards and reporting that make agent performance legible to leadership
Beyond 6 months
  • Lead applied science work on the next frontier as the agent grows: multi-turn evaluation, multimodal, personalization, ranking quality, conversational understanding
  • Influence team-level strategy on what we measure, what we improve, and why
  • Mentor and help grow the science function as it expands
What Success Looks Like
  • Clear, trusted accuracy metrics are consistently used across product and engineering
  • A robust automated evaluation framework for both offline and live experiments
  • Model and product changes are consistently measured before and after launch
  • Demonstrable improvements in LLM judge quality and eval coverage
  • Science leadership that informs what we build, not just whether it works
Career Growth
  • Depth track: become the org’s authority on AI evaluation: eval strategy, judge models, agent benchmarking
  • Breadth track: expand into other applied science problems (recommendations, personalization, ranking, multimodal, conversational understanding) as those areas come online
  • Leadership track: Senior / Staff Applied Scientist, with technical leadership across the science function
  • As the agent gets more capable, the science problems get richer
Ideal Background
  • 5+ years in Applied ML, AI Research, or Applied Science (PhD or equivalent depth strongly preferred)
  • Hands-on experience evaluating modern AI/ML systems: LLMs, agents, ranking, or recommendations
  • Direct experience with LLM-based systems: judge models, RAG, prompt engineering, fine-tuning, RLHF, or similar
  • Strong experimentation foundations: A/B testing, causal inference, statistical rigor
  • Proven ability to operate in ambiguity: defining problems, not just solving pre-defined ones
  • Clear, structured communication that influences across ML, engineering, and product
Compensation & Benefits

The expected base salary range for this role is $225,000 - $280,000 USD, and will vary based on skills, experience, role level, and geographic location. Final compensation will be determined by considering these factors alongside overall role scope and responsibilities.

In addition to base salary, Wizard offers:

  • Equity in the form of stock options
  • Medical, dental, and vision coverage
  • 401(k) plan
  • Flexible PTO and company holidays
  • Fully remote work within the United States
  • Periodic company offsites and team gatherings

Wizard is committed to fair, transparent, and competitive compensation practices.

Similar Jobs

8 Days Ago
In-Office or Remote
Bellevue, WA, USA
130K-163K Annually
Senior level
130K-163K Annually
Senior level
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
Lead research and development of LLM and generative AI solutions, design scalable real-time ML models, define priorities and metrics for deep learning and RAG, promote Responsible AI best practices, and drive adoption of new methods across teams.
Top Skills: Agentic FrameworksFine-TuningGenerative AiGraph ScienceGrounding ArchitecturesLlmsMultimodal Real-Time ProcessingNlpPyTorchRagTensorFlow
21 Days Ago
In-Office or Remote
130K-232K Annually
Senior level
130K-232K Annually
Senior level
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
Develop and evaluate novel AI/ML solutions for healthcare using structured and unstructured data. Build and test deep learning and classical ML models, design experiments, perform statistical analysis and visualizations, document models, collaborate with stakeholders, RUAI, and engineering to productionize solutions and MLOps pipelines, and publish/patent research.
Top Skills: Distributed ComputingGitLlm FrameworksNlpPythonPyTorchSnowflakeSparkTensorFlow
3 Days Ago
Remote
United States
Mid level
Mid level
Other
Build lightweight, Slack- and Sheets-integrated AI agents to pull and analyze data from ad and web APIs, surface anomalies and trends, and produce alerts, reports, and LLM-generated content briefs and recommendations. Deliver scoped, fast iterations on multi-step pipelines including classifiers, scorers, and recommendation layers.
Top Skills: AhrefsAnthropicClaudeGa4GeminiGoogle Ad ManagerGoogle ApisGoogle Search ConsoleGoogle SheetsOpenaiReddit ApiRest ApisSimilarwebSlack ApiYoutube Data Api

What you need to know about the Seattle Tech Scene

Home to tech titans like Microsoft and Amazon, Seattle punches far above its weight in innovation. But its surrounding mountains, sprinkled with world-famous hiking trails and climbing routes, make the city a destination for outdoorsy types as well. Established as a logging town before shifting to shipbuilding and logistics, the Emerald City is now known for its contributions to aerospace, software, biotech and cloud computing. And its status as a thriving tech ecosystem is attracting out-of-town companies looking to establish new tech and engineering hubs.

Key Facts About Seattle Tech

  • Number of Tech Workers: 287,000; 13% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Amazon, Microsoft, Meta, Google
  • Key Industries: Artificial intelligence, cloud computing, software, biotechnology, game development
  • Funding Landscape: $3.1 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Madrona, Fuse, Tola, Maveron
  • Research Centers and Universities: University of Washington, Seattle University, Seattle Pacific University, Allen Institute for Brain Science, Bill & Melinda Gates Foundation, Seattle Children’s Research Institute

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account