Cantina Labs Logo

Cantina Labs

ML Research Engineer, TTS

Reposted 12 Hours Ago
Be an Early Applicant
Remote
Hiring Remotely in Greece
Senior level
Remote
Hiring Remotely in Greece
Senior level
Build and productionize large-scale TTS and adjacent speech models end-to-end: model architecture, training, alignment, evaluation, data curation, distributed GPU scaling, latency/cost profiling, tooling, and safety/mitigation for speech systems.
The summary above was generated by AI

About Cantina

Cantina is a new social platform founded by Sean Parker with the most advanced AI character creator. Our bots are lifelike, social creatures that can interact wherever people are online—across voice, video, and text. Create yourself, imagine someone new, or choose from thousands of characters to share infinitely scalable, personalized content and seamless group chat.

If you’re excited about how AI can shape creativity and social interaction, come help us build what’s next.

About the Role:

We’re looking for a Research / ML Engineer to join our Speech Team to build state-of-the-art speech systems end-to-end—from data specs through production inference. You’ll drive the model ↔ data ↔ eval flywheel for TTS and adjacent tasks (voice cloning, controllable TTS, voice conversion and more), partnering closely with research, data, and infra to ship fast, reliable, and cost-aware models. In this role, you will work at the intersection of cutting-edge research and practical engineering, contributing to the development of safe, steerable, and trustworthy AI systems.

What You’ll Do:

  • Model Building: Architect, implement, pre-train, fine-tune, and post-train/alignment (e.g., GRPO/DPO) for large-scale speech models.

  • Project Leadership: Independently lead small research projects while collaborating on larger team initiatives.

  • Experimental Design: Design, run, and analyze scientific experiments to advance our understanding of the models.

  • Tool Development: Develop and improve dev tooling to enhance team productivity.

  • Full-Stack Contribution: Contribute to the entire stack, from low-level optimizations to high-level model design.

  • Data Ownership: Define data requirements and collaborate on acquisition, curation, augmentation, labeling quality, and synthetic data strategies.

  • Rigorous Evaluation: Design automated objective/subjective evaluations—listening tests, SV/WER/ASR-based metrics, robustness & bias checks, and red-team studies.

  • Pipeline Delivery: Harden the training → evaluation → inference pipeline; profile latency, memory, and cost; and meet production SLAs with robust monitoring and rollback.

  • GPU Scaling: Partner with infrastructure to run distributed training/inference on cloud fleets and productionize models with reliability and observability.

  • Safety & Responsibility: Contribute to safety/consent guardrails and to misuse/abuse mitigation for responsible speech technology.

What You’ll Bring:

  • Exceptional research/development experience with large scale audio models (>3B models and >500k hours data).

  • Exceptional understanding and hands-on experience with transformer architectures and/or diffusion models (inc. distillation and streaming) and/or audio language modelling.

  • Strong experience with multi-node and multi-gpu distributed model training.

  • Strong software engineering skills with a proven track record of building complex systems

  • Strong with PyTorch and performance work (profiling, CUDA/Triton/C++ as needed) and writing reliable production quality code.

  • Shipped large scale speech/audio models to production.

  • Background in working with large-scale ML data.

  • Ability to iterate on data,, and triangulate quality using subjective and objective signals.

  • Notable publications and/or open source contributions in speech/audio/ML.

  • Experience with voice-cloning, speech-control, voice-generation.

Preferred Experience:

  • Shipped large scale speech/audio models (TTS/VC/ASR) to production.

  • Work on large-scale ML systems.

  • Experience with audio language modelling, transformer architectures.

  • Experience with voice-cloning, speech-control, voice-generation.

  • Background in processing large-scale ML data.

  • Publications or notable open-source in speech/audio/ML.

Similar Jobs

12 Hours Ago
Remote
Junior
Junior
Artificial Intelligence • Cybersecurity
Support cloud security by managing IAM across tools, maintaining security tooling and monitoring, investigating alerts, reviewing Terraform PRs for secure deployments, and tracking threat intelligence while learning cloud-native security practices.
Top Skills: AWSAws IamAzureBashContainersEndpoint ProtectionGCPInfrastructure As CodeInfrastructure MonitoringKubernetesPythonTerraform
12 Hours Ago
Remote
Senior level
Senior level
Artificial Intelligence • Cybersecurity
Design and build self-service platform tooling and "Golden Paths" to enable developer autonomy. Develop Terraform modules, CLI/APIs, GitHub Actions and GitOps pipelines (ArgoCD), integrate OPA for automated governance, and ensure observability (Prometheus/Grafana) across services.
Top Skills: ArgocdAWSCrdsDastGithub ActionsGitopsGoGrafanaKubernetesOpen Policy Agent (Opa)OperatorsPrometheusPythonSastTerraform
12 Hours Ago
Remote
111K-137K Annually
Senior level
111K-137K Annually
Senior level
Security • Software • Cybersecurity • Automation
Provide commercial and privacy legal support for SaaS agreements (SA, DPA, NDA), negotiate B2B contracts, maintain contract templates and playbooks, ensure privacy/security and regulatory compliance, advise business stakeholders, and implement processes and tools to scale the legal function.
Top Skills: Contract Database SolutionDocusignFinance Invoicing SystemsGoogle SuiteGmailGoogle DocsProject Management SoftwareSlack

What you need to know about the Seattle Tech Scene

Home to tech titans like Microsoft and Amazon, Seattle punches far above its weight in innovation. But its surrounding mountains, sprinkled with world-famous hiking trails and climbing routes, make the city a destination for outdoorsy types as well. Established as a logging town before shifting to shipbuilding and logistics, the Emerald City is now known for its contributions to aerospace, software, biotech and cloud computing. And its status as a thriving tech ecosystem is attracting out-of-town companies looking to establish new tech and engineering hubs.

Key Facts About Seattle Tech

  • Number of Tech Workers: 287,000; 13% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Amazon, Microsoft, Meta, Google
  • Key Industries: Artificial intelligence, cloud computing, software, biotechnology, game development
  • Funding Landscape: $3.1 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Madrona, Fuse, Tola, Maveron
  • Research Centers and Universities: University of Washington, Seattle University, Seattle Pacific University, Allen Institute for Brain Science, Bill & Melinda Gates Foundation, Seattle Children’s Research Institute

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account