Practice by Numbers Jobs

Sr. Site Reliability Engineer

Practice by Numbers

Sr. Site Reliability Engineer

Posted 2 Days Ago

Remote or Hybrid

Hiring Remotely in Redmond, WA, USA

120K-150K Annually

Senior level

Remote or Hybrid

Hiring Remotely in Redmond, WA, USA

120K-150K Annually

Senior level

Lead and own reliability for critical backend and distributed systems: design, launch, on-call, incident leadership, SLO/SLI/error budget definition, automation to remove toil, observability improvement, resilience testing, mentoring, and cross-team reliability initiatives for production healthcare workflows.

The summary above was generated by AI

This is an engineering-first Senior SRE role.

We’re looking for senior engineers who have:

Built and shipped significant backend systems and/or distributed platforms
Owned services end-to-end in production (design → launch → on-call → reliability improvements)
Led incident response and driven durable follow-ups
Improved reliability by writing software and changing system design—not by adding manual process

You’ll partner closely with product engineering to ensure reliability is designed in from day one, while also building the tooling and platforms that make operating services safer and easier for every engineer.

Engineers here own services end-to-end—from design to production reliability.

Important: This is not a system administrator role. We are explicitly hiring an engineering leader in reliability.Engineering degree is an absolute requirement (BS/MS in CS/CE/EE or closely related engineering field).

What You’ll Do

Own reliability outcomes for critical services: availability, latency, incident rate, and recovery time.
Design and build reliable, scalable distributed systems that support mission-critical healthcare workflows.
Define and operationalize SLOs/SLIs and error budgets; drive adoption across teams and use them to prioritize work.
Lead incident response for high-severity issues; improve on-call effectiveness and reduce alert fatigue.
Run blameless postmortems and ensure follow-ups are implemented, measured, and stick.
Write software to eliminate operational toil: automation, self-service tooling, guardrails, and developer platforms.
Raise the bar on observability (metrics/logs/traces), alerting strategy, and operational readiness.
Improve resilience through capacity planning, load testing, performance tuning, and failure testing.
Mentor engineers (SRE and product engineers) on reliability practices, debugging, and production ownership.
Drive cross-team improvements like production readiness reviews, release safety (progressive delivery), and standard runbooks.

What We’re Looking For

Required

Engineering degree is mandatory: BS/MS in Computer Science, Computer Engineering, Electrical Engineering, or a closely related engineering field.
6+ years experience in software engineering, SRE, infrastructure/platform engineering, or related.
Strong programming skills in Go, Python, Java, or similar (production-quality code).
Proven experience building and operating production backend services or distributed systems.
Meaningful experience in on-call rotations, incident leadership, and post-incident improvement execution.
Strong debugging ability across complex systems: latency, saturation, cascading failures, dependency issues.
Experience with cloud infrastructure (AWS preferred, GCP/Azure acceptable).

Strong Signal

You’ve owned reliability for customer-facing services with clear, measurable improvements (e.g., higher availability, lower MTTR).
You’ve built internal platforms/tooling that made other engineers faster and reduced operational burden.
You’ve worked in an SRE culture with SLOs, error budgets, and blameless postmortems.
You’ve led multi-quarter reliability initiatives spanning multiple teams/services.

Technologies We Work With (Examples)

Cloud: AWS
Containers: Docker, Kubernetes
Infrastructure as Code: Terraform
Observability: Prometheus, Grafana, OpenTelemetry
Languages: Go, Python, TypeScript
CI/CD: GitHub Actions

(Experience with everything isn’t required—strong fundamentals and learning velocity matter most.)

What This Role Is Not

To be explicit, this role is not:

System administration / IT ops / helpdesk
Manual server patching as a primary responsibility
A “click-ops” cloud operator role

This is a senior engineering role focused on software-driven reliability and platform engineering.

Why Join PBN

Build and operate mission-critical healthcare infrastructure that supports real patient workflows.
High impact: reliability work directly improves customer trust and revenue-critical operations.
Small team with high ownership, autonomy, and ability to influence architecture.
Strong engineering culture focused on automation, simplicity, and measurable outcomes.

Compensation

The base pay range for this role is $120,000 – $150,000 per year.

Similar Jobs

Circle (circle.so)

Senior Site Reliability Engineer

18 Days Ago

Easy Apply

Remote

United States

Easy Apply

130K-140K Annually

Senior level

130K-140K Annually

Senior level

Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software

Lead SRE work to keep Circle highly available and performant: respond to incidents, own monitoring/alerting/log management, manage and optimize MySQL/Postgres/ClickHouse/Redis databases, maintain server infrastructure and deployment pipelines, collaborate with engineering teams, and build internal SRE tooling and automation.

Top Skills: AWSClickhouseKubernetesLlm-Based Tools (Copilots)MySQLPostgresRedis

Coinbase

Senior Site Reliability Engineer

19 Days Ago

Easy Apply

Remote

USA

Easy Apply

186K-219K Annually

Senior level

186K-219K Annually

Senior level

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3

Own reliability, automation, and DevOps for Coinbase's corporate IAM platform: on-call/incident response, CI/CD and IaC pipelines, identity lifecycle tooling, observability and disaster recovery, documentation, and cross-team IAM advisement to ensure secure, scalable access for a global workforce.

Top Skills: AbacAuth0AWSAzureC#Ci/CdContainer OrchestrationDuoEntraidGCPGenerative AiGitGoIacJavaMfaOktaPingPythonRbacRubySsoTerraform

Coinbase

Senior Site Reliability Engineer

19 Days Ago

Easy Apply

Remote

USA

Easy Apply

186K-219K Annually

Senior level

186K-219K Annually

Senior level

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3

Senior SRE on the IT Operations team owning reliability, monitoring, and incident response for AI infrastructure. Build automation, CI/CD and Kubernetes tooling, improve observability and documentation, and develop internal full-stack tools using Go or Python. Partner with Infrastructure, Security, and Compliance to scale secure, resilient AI deployment pipelines.

Top Skills: AnsibleAWSBashChefCi/CdDockerEc2GitGoKubernetesLinuxPuppetPythonRubySaltTerraform

What you need to know about the Seattle Tech Scene

Home to tech titans like Microsoft and Amazon, Seattle punches far above its weight in innovation. But its surrounding mountains, sprinkled with world-famous hiking trails and climbing routes, make the city a destination for outdoorsy types as well. Established as a logging town before shifting to shipbuilding and logistics, the Emerald City is now known for its contributions to aerospace, software, biotech and cloud computing. And its status as a thriving tech ecosystem is attracting out-of-town companies looking to establish new tech and engineering hubs.

Key Facts About Seattle Tech

Number of Tech Workers: 287,000; 13% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Amazon, Microsoft, Meta, Google
Key Industries: Artificial intelligence, cloud computing, software, biotechnology, game development
Funding Landscape: $3.1 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Madrona, Fuse, Tola, Maveron
Research Centers and Universities: University of Washington, Seattle University, Seattle Pacific University, Allen Institute for Brain Science, Bill & Melinda Gates Foundation, Seattle Children’s Research Institute