Maximum of 25 job preferences reached.
Top Remote Site Reliability Engineer Jobs in Seattle, WA
Other
As a Site Reliability Engineer, you will design cloud platforms, automate operations, maintain infrastructure, and support engineering teams in delivering reliable services.
Top Skills:
AnsibleAWSAzureBashCircleCICloudFormationDatadogDnsDockerGitlab CiGoGCPGrafanaHTTPHttpsJenkinsKubernetesKvmLinuxPerlPrometheusPythonRubyTcp/IpTerraformUnixVMware
Healthtech • Other • Software
As a Senior Database Site Reliability Engineer, you'll design, implement, and maintain PostgreSQL systems, ensure reliability, automate maintenance tasks, and participate in incident response.
Top Skills:
AnsibleBashDatadogGrafanaNew RelicPostgresPowershellPrometheusPythonTerraform
Software • Financial Services
Ensure platform reliability, performance, and availability by implementing observability, automating infrastructure, participating in on-call rotations and post-mortems, partnering with Product and Engineering, designing scalable architectures, mentoring teammates, and integrating Dynatrace with Azure DevOps and Jira while supporting compliance (SOC/FedRAMP).
Top Skills:
.NetAksAlpineAnsibleAppinsightsArm TemplatesAWSAzure DevopsBashBicepC#ChefCloudFormationDatadogDebianDynatraceEksGCPGitGitGksGrafanaHelmJIRAKubernetesLog AnalyticsAzureNew RelicOnestream SoftwareOpenshiftPowershellPowershell DscPrometheusPuppetPythonRest ApisSQLTerraformUbuntu
Fintech • Information Technology
As a Site Reliability Engineer at Alpaca, you will ensure system reliability and performance, troubleshoot issues, and collaborate with teams to design scalable features.
Top Skills:
GoGormLinuxPgxPostgresPrometheusSqlc
Gaming • Software
The Site Reliability Engineer will manage infrastructure stability and scalability, lead cloud migrations, and optimize performance across systems while mentoring team members.
Top Skills:
AnsibleAWSAzureBashChefCloudFormationDatadogDockerElk StackGCPGoGrafanaKubernetesPrometheusPuppetPythonTerraformUnix/Linux
Artificial Intelligence • Cloud • Information Technology • Software • Big Data Analytics
Founding Staff SRE for Volcano: define SLOs/error budgets, architect multi-region Kubernetes infrastructure, build GitOps/CI-CD with ArgoCD/Helm/Terraform, scale managed Postgres/Redis/object storage, implement observability with Datadog/Prometheus/Grafana, lead incident response and SRE culture, and mentor cross-functional teams.
Top Skills:
ArgocdCanary DeploymentsCi/CdCniDatadogGitopsGrafanaHelmIngressKubernetesObject StoragePostgresPrometheusRedisService MeshTerraformTerragrunt
Software
As a Site Reliability Engineer, you'll enhance system reliability, collaborate on production readiness, define SLIs/SLOs, and improve incident response.
Top Skills:
AWSDatadogGrafanaKubernetesOpentelemetryPrometheusTypescript
Software • Cryptocurrency
Manage and scale Kubernetes clusters, automate infrastructure, optimize performance, maintain blockchain nodes, and improve system reliability while collaborating with product teams.
Top Skills:
Aws (Ec2Aws EksDatadogDockerIam)KubernetesOpentelemetryPulumiRdsS3Terraform
Database
Embed with service teams to define SLIs/SLOs and error budgets, run Operational Readiness Reviews, improve incident-to-improvement pipelines, advise on resilience and architecture, reduce operational toil through automation, and shape org-wide on-call practices and operational maturity.
Top Skills:
AWSCdkGrafanaKubernetesOpentelemetryPostgresPulumiTerraformVictoriametrics
Energy • Manufacturing • Solar • Renewable Energy
Operate and harden production EKS Kubernetes clusters across multiple AWS regions. Build IaC (Terraform, Ansible), implement policy-as-code, ensure security and compliance, manage observability (Prometheus/Grafana), perform L3 support and incident RCA, run platform-level testing and DR, automate toil, and partner with application teams for sizing and cost optimization to achieve high availability for critical cloud infrastructure.
Top Skills:
AlbAnsibleArgocdAws Ec2Certificate ManagementDatadogDynatraceEksFluxGoGrafanaKubernetesMskPod PriorityPrometheusPythonRdsS3Service MeshSplunkTerraformVpc
Healthtech • Software
The SRE Technical Project Manager will lead project delivery, incident management, automation processes, and uptime communication, partnering with SRE and development teams to ensure system stability and scalability.
Top Skills:
Ai BotsDatadogJIRAJira Service ManagementMs TeamsOpsgeniePagerduty
Real Estate • Financial Services • PropTech
Support and optimize products migrated to AWS, implement cloud best practices, maintain operational coverage, enhance automation, observability, CI/CD/GitOps, and security. Collaborate with development and platform teams to scale, troubleshoot, and ensure reliable SaaS operations.
Top Skills:
AmisArgocdAWSAws Elastic BeanstalkAws Transfer FamilyAzure DevopsBashCloudwatchCurlDockerEc2EksFluxcdGitGitopsHTTPIstioKubernetesLinkerdLoad BalancerPowershellPythonRdsSQLTerraformWget
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
eCommerce
Ensure reliability and availability of Tradeweb's global AWS platform through IaC automation, observability and SLO definition, incident triage and resolution, on-call duties, collaboration with development teams, and security-focused platform improvements.
Top Skills:
ArgocdAWSAws LambdaEksGitsecopsInfrastructure As Code (Iac)Kubernetes (K8S)KustomizeLgtmLinux/UnixPulumiPythonSmsSns
Reposted 25 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
Develop and maintain Kubernetes runtime environments, support developers, resolve critical issues, and participate in on-call rotations for production systems.
Top Skills:
AWSAzureCert-ManagerCorednsCrdsCriCsiGatekeeperGCPGoHelmKubernetesKustomizeOperatorsPythonTerraform
Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
Lead SRE work to keep Circle highly available and performant: respond to incidents, own monitoring/alerting/log management, manage and optimize MySQL/Postgres/ClickHouse/Redis databases, maintain server infrastructure and deployment pipelines, collaborate with engineering teams, and build internal SRE tooling and automation.
Top Skills:
AWSClickhouseKubernetesLlm-Based Tools (Copilots)MySQLPostgresRedis
Information Technology • Security
The Staff Site Reliability Engineer will lead the architecture and security of the SimSpace cyber range platform, focusing on reliability, automation, and observability across diverse deployment environments while mentoring engineers and driving infrastructure initiatives.
Top Skills:
ArgocdGithub ActionsGoGrafana TankaJsonnetKubernetesPython
Artificial Intelligence • Cloud • Information Technology • Software
As a Staff SRE, you will ensure the reliability and performance of Andromeda's GPU infrastructure, lead incident responses, build observability systems, and mentor engineers, while collaborating closely with engineering and customers.
Top Skills:
AnsibleCudaGoHelmKubernetesLinuxNcclNvidiaPythonRustSlurmTerraform
Cloud • Software • Analytics
Join Arista Networks as a Site Reliability Engineer to manage CloudVision service reliability, scalability, and stability in a FedRAMP environment, focusing on areas like architecture, security, and performance optimization.
Top Skills:
AnsibleBashGCPGkeGoKubernetesPulumiPython
19 Days AgoSaved
Easy Apply
Easy Apply
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Own reliability, automation, and DevOps for Coinbase's corporate IAM platform: on-call/incident response, CI/CD and IaC pipelines, identity lifecycle tooling, observability and disaster recovery, documentation, and cross-team IAM advisement to ensure secure, scalable access for a global workforce.
Top Skills:
AbacAuth0AWSAzureC#Ci/CdContainer OrchestrationDuoEntraidGCPGenerative AiGitGoIacJavaMfaOktaPingPythonRbacRubySsoTerraform
19 Days AgoSaved
Easy Apply
Easy Apply
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Senior SRE on the IT Operations team owning reliability, monitoring, and incident response for AI infrastructure. Build automation, CI/CD and Kubernetes tooling, improve observability and documentation, and develop internal full-stack tools using Go or Python. Partner with Infrastructure, Security, and Compliance to scale secure, resilient AI deployment pipelines.
Top Skills:
AnsibleAWSBashChefCi/CdDockerEc2GitGoKubernetesLinuxPuppetPythonRubySaltTerraform
Healthtech • Information Technology • Software • Telehealth
The Senior Site Reliability Engineer will develop, monitor, and maintain distributed production systems, ensuring uptime for patients and providers while automating processes and supporting a large engineering team.
Top Skills:
AWSDockerGCPKubernetes
Artificial Intelligence • Machine Learning • Software • Analytics
The role involves end-to-end ownership of AWS infrastructure, managing Kubernetes platforms, and ensuring system reliability through observability and automation. Responsibilities include incident response and maintaining CI/CD systems.
Top Skills:
ArgocdAWSDatadogGitGoKubernetesPythonTerraform
Healthtech • Software • Analytics • Business Intelligence
Lead and own reliability for critical backend and distributed systems: design, launch, on-call, incident leadership, SLO/SLI/error budget definition, automation to remove toil, observability improvement, resilience testing, mentoring, and cross-team reliability initiatives for production healthcare workflows.
Top Skills:
AWSAzureDockerGCPGithub ActionsGoGrafanaJavaKubernetesOpentelemetryPrometheusPythonTerraformTypescript
Healthtech • Information Technology • Software
The Sr. Database Site Reliability Engineer manages the reliability and performance of Azure PostgreSQL platforms, applying SRE principles for automation and observability. Responsibilities include incident response, backup strategies, and ensuring compliance with security standards.
Top Skills:
ArgocdAzure PostgresqlCi/CdDatadogGitHelmKubernetesTerraform
Artificial Intelligence • Information Technology • Software • Automation
Own US PST coverage for releases and incidents as the first SRE; bridge infrastructure and code by working with Kubernetes, Terraform, and AWS and patching Elixir when needed; lead incident response and post-mortems; define SLOs and observability; author runbooks and support HIPAA-aligned compliance for a regulated medical-device platform.
Top Skills:
AWSElixirKubernetesTerraform
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Top Seattle, WA Companies Hiring Remote Site Reliability Engineers
See AllPopular Job Searches
All Filters
Total selected ()
No Results
No Results

















.png)






