Position Purpose
YPO is seeking a Senior / Lead DevOps Engineer to design, build, and operate the cloud infrastructure and developer platform that will power its next generation of products. This is a hands-on technical leadership role spanning the full DevOps surface — cloud infrastructure, CI/CD pipelines, release engineering, observability, platform reliability, and developer experience — all in service of a rapidly scaling AI-first mobile platform.
You will be a close partner to the Director of Product, Lead Security Engineer, and mobile engineering leadership — connecting platform reliability to product velocity and security posture in equal measure. You will bring strong technical depth, a product-minded approach to internal tooling, and the communication skills to champion engineering excellence across the organisation.
Key Responsibilities
Cloud Infrastructure Design and Operations
Own the architecture and day-to-day operation of YPO's cloud infrastructure across its full lifecycle.
Architect, implement, and continuously evolve YPO's cloud infrastructure across AWS, Azure, and/or GCP — ensuring it is scalable, resilient, cost-efficient, and production-ready for a global AI-first platform.
Design and manage multi-region, highly available environments that meet YPO's performance and uptime requirements for a 35,000+ member global community.
Own cloud cost management and FinOps practices — implementing tagging strategies, reserved capacity planning, and anomaly detection to optimise infrastructure spend without sacrificing reliability.
Lead the evaluation and adoption of new cloud services, platforms, and tooling — making well-reasoned build-vs-buy decisions based on engineering impact and long-term maintainability.
Manage DNS, CDN, load balancing, and networking configurations across cloud environments, ensuring global performance and failover capabilities.
Infrastructure as Code and Automation
Codify everything. If it cannot be automated, it should be questioned.
Lead YPO's Infrastructure as Code practice using Terraform as the primary tool, ensuring all infrastructure is version-controlled, reviewed, tested, and deployed through automation — never manually.
Define and enforce IaC standards, module structures, and governance practices across the engineering organisation, ensuring infrastructure code is readable, reusable, and maintainable over time.
Automate environment provisioning, teardown, and configuration management for development, staging, and production environments — enabling engineers to spin up and destroy environments on demand.
Build and maintain automation pipelines for routine operational tasks including certificate rotation, secret rotation, compliance remediation, and infrastructure drift detection.
Write clean, well-tested automation scripts in Python, Bash, or equivalent — treating operational scripts with the same engineering rigour applied to product code.
CI/CD Pipeline Design and Release Engineering
Accelerate the path from commit to production without sacrificing quality or safety.
Design, build, and maintain end-to-end CI/CD pipelines for YPO's mobile (iOS and Android), backend API, AI platform, and data engineering workloads — reducing time-to-deploy and increasing deployment frequency.
Implement branch strategies, environment promotion workflows, and feature flagging patterns that allow teams to ship incrementally and safely to a global production audience.
Integrate automated quality gates — unit tests, integration tests, security scans (SAST/DAST/SCA), container scanning, and IaC linting — as non-negotiable steps in every pipeline.
Lead the adoption of progressive delivery techniques including blue-green deployments, canary releases, and traffic shifting to minimise deployment risk and enable rapid rollback.
Partner with the Lead Security Engineer to embed security and compliance checks into every pipeline stage, ensuring secure-by-default releases across all environments.
Own release documentation, change management workflows, and deployment runbooks — ensuring all production changes are auditable, traceable, and recoverable.
Container Orchestration and Platform Engineering
Build the platform that the platform runs on.
Design, operate, and continuously improve YPO's container orchestration infrastructure using Kubernetes (EKS, AKS, or GKE), ensuring reliable scheduling, resource efficiency, and operational simplicity.
Manage container image governance, including base image standards, image scanning pipelines, registry management, and deprecation policies for outdated or vulnerable images.
Implement and maintain service mesh, ingress controllers, network policies, and inter-service security patterns appropriate for YPO's AI platform and mobile API surfaces.
Evaluate and adopt platform engineering tools that improve developer self-service — internal developer platforms (IDPs), environment-as-a-service patterns, and golden path templates that let engineers provision what they need without DevOps as a bottleneck.
Lead the migration, decomposition, or consolidation of existing services as part of YPO's digital transformation roadmap — balancing technical debt reduction with delivery velocity.
Observability, Monitoring, and Site Reliability
If you cannot measure it, you cannot improve it. If you cannot see it, you cannot protect it.
Design and implement a comprehensive observability stack covering metrics, logs, distributed traces, and synthetic monitoring — giving engineering and product teams clear, real-time visibility into system health and member experience quality.
Define and enforce SLOs, SLIs, and error budgets across YPO's platform services, establishing a shared language between product, engineering, and operations for reliability conversations.
Build and maintain dashboards, alerting rules, and on-call runbooks that surface actionable signals — reducing alert fatigue, improving mean time to detect (MTTD), and enabling fast mean time to recover (MTTR).
Lead blameless post-mortem processes following significant incidents, driving systemic improvements and institutional learning rather than point fixes.
Own capacity planning and performance benchmarking for the AI-first mobile platform, ensuring infrastructure scales proactively ahead of member growth and feature launches.
DevSecOps and Compliance Automation
Security is not a gate at the end of the pipeline. It is built into every stage.
Partner with the Lead Security Engineer to embed security controls, policy-as-code enforcement, and compliance automation throughout the CI/CD pipeline and infrastructure provisioning lifecycle.
Implement and maintain secrets management solutions (HashiCorp Vault, AWS Secrets Manager, or equivalent) — ensuring no credentials, tokens, or sensitive configuration are ever stored in source code or plaintext.
Enforce cloud security baselines using policy-as-code frameworks (Open Policy Agent, AWS Config Rules, Azure Policy) to detect and auto-remediate configuration drift in real time.
Support SOC 2, ISO 27001, and other compliance programmes by providing infrastructure evidence, automating audit artefact collection, and maintaining clear audit trails for all infrastructure changes.
Manage network security controls including VPCs, security groups, private endpoints, and zero-trust connectivity patterns across cloud environments.
Developer Experience and Technical Leadership
The best platform is one that engineers love to build on.
Own the internal developer experience — streamlining local development environments, onboarding workflows, and self-service tooling so that engineers spend their time building product, not fighting infrastructure.
Define and document engineering standards for environment configuration, deployment patterns, and operational runbooks, ensuring institutional knowledge is captured and accessible.
Mentor and up-level junior engineers and platform contributors, building DevOps literacy across the wider engineering organisation and breaking down silos between platform and product teams.
Act as a cross-functional bridge between product, mobile engineering, AI/data engineering, and security — translating competing infrastructure priorities into a coherent, sequenced delivery plan.
Contribute to technology investment decisions with well-reasoned proposals, total-cost-of-ownership analysis, and clear trade-off documentation.
Skills and Qualifications
Required
5+ years of hands-on experience in DevOps, platform engineering, or site reliability engineering, with at least 2 years in a senior or lead capacity.
Deep, demonstrable expertise with at least one major cloud provider (AWS strongly preferred) and solid working knowledge of a second (Azure or GCP).
Infrastructure as Code proficiency: Terraform is required. Experience with CloudFormation, Pulumi, or CDK is a plus.
CI/CD experience: hands-on design and operation of pipelines using GitHub Actions, GitLab CI, CircleCI, Jenkins, or equivalent tools across multiple workload types.
Strong Kubernetes experience, including cluster management, Helm chart authoring, RBAC, network policies, and workload auto-scaling in a production cloud environment.
Proficiency in Python for automation and tooling; comfort with Bash/shell scripting for operational tasks.
Solid understanding of networking fundamentals — DNS, TCP/IP, TLS, load balancing, CDN, VPC design, and private connectivity patterns.
Experience implementing observability solutions using tools such as Datadog, Grafana, Prometheus, OpenTelemetry, CloudWatch, or equivalent platforms.
Practical knowledge of container security, secrets management, and cloud IAM patterns, with experience working alongside a security engineering function.
Strong communication skills — able to write clear documentation, present technical trade-offs to non-technical stakeholders, and lead engineering conversations with confidence.
Demonstrated ability to operate with autonomy in a fast-moving environment, balancing long-term platform investment with near-term delivery needs.
Preferred
Experience supporting native iOS and/or Android mobile release pipelines, including code signing, provisioning profile management, App Store / Play Store automation, and mobile-specific testing infrastructure.
Familiarity with AI/ML infrastructure, including model serving platforms, GPU workload scheduling, data pipeline orchestration (Airflow, Prefect, or equivalent), and vector database operations.
Experience with platform engineering tools such as Backstage, Port, or similar internal developer portals.
Exposure to FinOps tooling (Infracost, CloudHealth, Spot.io) and cloud cost optimisation at scale.
Experience with multi-region, active-active deployment architectures and global traffic management.
Prior experience in a global SaaS or membership platform serving diverse geographic markets.
Relevant Certifications (Valued)
One or more of the following is valued, though not a strict requirement:
AWS Certified DevOps Engineer — Professional
AWS Certified Solutions Architect — Professional
Microsoft Certified: DevOps Engineer Expert (AZ-400)
Google Professional Cloud DevOps Engineer
Certified Kubernetes Administrator (CKA)
Certified Kubernetes Application Developer (CKAD)
HashiCorp Certified: Terraform Associate
Additional Information
Travel: 10–15% (domestic & international)
Flexible hours to support global teams
EOE
YPO is an Equal Opportunity Employer. YPO takes pride in supporting a diverse workforce and demonstrates this through its policies and practices. YPO does not discriminate in recruiting, hiring, training, promotion, or other employment practices for reasons of race, color, religion, gender, national origin, age, sexual orientation, marital or veteran status, disability, or any other legally protected status.
Top Skills
Similar Jobs
What you need to know about the Seattle Tech Scene
Key Facts About Seattle Tech
- Number of Tech Workers: 287,000; 13% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Amazon, Microsoft, Meta, Google
- Key Industries: Artificial intelligence, cloud computing, software, biotechnology, game development
- Funding Landscape: $3.1 billion in venture capital funding in 2024 (Pitchbook)
- Notable Investors: Madrona, Fuse, Tola, Maveron
- Research Centers and Universities: University of Washington, Seattle University, Seattle Pacific University, Allen Institute for Brain Science, Bill & Melinda Gates Foundation, Seattle Children’s Research Institute


.png)
