The Lead Site Reliability Engineer will ensure platform reliability and performance, guiding SRE principles, managing incidents, and fostering collaboration across teams while leveraging cloud technologies and automation.
At First Advantage (Nasdaq: FA), people are at the heart of everything we do. From our customers and partners to our greatest advantage — our team members. Operating with empathy and compassion, First Advantage fosters a global inclusive workforce devoted to the diverse voices that make up our talent and products. Our team members empower each other to be their authentic selves and treat all with respect, integrity, and fairness.
Say hello to a rewarding career, and come join a leading provider of mission-critical background screening solutions to some of the most recognized Fortune 100 and Global 500 brands.
First Advantage is a global leader in background screening, identity, and verification solutions. As we continue to scale our digital platforms and modern cloud-native infrastructure, we are seeking a highly skilled and forward-thinking Lead Site Reliability Engineer (SRE) to drive reliability, resilience, and operational excellence across our systems.
The Lead SRE will be responsible for guiding reliability strategy, overseeing complex incident response, improving observability, strengthening automation and CI/CD practices, and partnering closely with engineering teams to embed SRE principles throughout the organization. This role requires a deep understanding of modern cloud architecture—including both Azure and AWS—as well as expertise in Linux systems, monitoring technologies, and root‑cause analysis.
This is a senior hands-on engineering role, ideal for someone who enjoys solving difficult problems at scale and mentoring others while driving meaningful improvements to uptime, performance, and customer experience.
What You'll Do:
What You'll Need to be Successful:
Why First Advantage is Your Next Big Career Move
First Advantage is going through a technology transformation! We are looking for experts who are excited to work with advanced technologies and provide best-in-class user experiences, drive the development and deployment of scalable solutions, and smoothly guide our agile teams and clients through meaningful changes as we continue to expand our impact.
What Are You Waiting For? Apply Today!
You have learned a little about us today – we want to learn about you! If you think this position and our company are a great fit for your areas of interest and expertise, tell us about you by applying now!
The salary range for this position is approximately $120,000 - $150,000 base annually. This range reflects our good faith estimate to pay fairly as to what our ideal candidates are likely to expect, and we tailor our offers within the range based on the selected candidate’s experience, industry knowledge, technical and communication skills, and other factors that may prove relevant during the interview process.
Say hello to a rewarding career, and come join a leading provider of mission-critical background screening solutions to some of the most recognized Fortune 100 and Global 500 brands.
First Advantage is a global leader in background screening, identity, and verification solutions. As we continue to scale our digital platforms and modern cloud-native infrastructure, we are seeking a highly skilled and forward-thinking Lead Site Reliability Engineer (SRE) to drive reliability, resilience, and operational excellence across our systems.
The Lead SRE will be responsible for guiding reliability strategy, overseeing complex incident response, improving observability, strengthening automation and CI/CD practices, and partnering closely with engineering teams to embed SRE principles throughout the organization. This role requires a deep understanding of modern cloud architecture—including both Azure and AWS—as well as expertise in Linux systems, monitoring technologies, and root‑cause analysis.
This is a senior hands-on engineering role, ideal for someone who enjoys solving difficult problems at scale and mentoring others while driving meaningful improvements to uptime, performance, and customer experience.
What You'll Do:
- Site Reliability & Platform Stability
- Lead reliability initiatives across multiple high-availability, large-scale SaaS systems, ensuring platform uptime, performance, and resilience.
- Build and maintain distributed systems, infrastructure components, and automation tooling to ensure consistent, reliable delivery of production services.
- Champion proactive reliability engineering, holistic system monitoring, and continuous operational improvements.
- Partner with architecture, engineering, and operations teams to define SLAs, SLOs, and SLIs. - Cloud Engineering (Azure & AWS)
- Architect, build, and maintain cloud infrastructure using best practices.
- Guide cloud migrations, cost optimization, and resilience engineering across multi-cloud environments.
- Implement and enforce cloud security, compliance, and governance standards.
- DevOps, CI/CD, and Automation
- Create and maintain CI/CD pipelines using GitHub Actions, Azure DevOps, Jenkins, or equivalent.
- Automate deployments using IaC tools (Terraform, Bicep, CloudFormation).
- Reduce manual operational burden through automation and self-service tooling.
- Monitoring, Observability & Performance
- Implement observability stacks covering metrics, logs, traces, and synthetic checks.
- Standardize monitoring practices using industry tooling.
- Perform performance analysis, load testing, and optimization.
- Incident Response & Management
- Serve as Incident Commander for major production incidents.
- Define and improve incident management processes.
- Ensure clear communication during outages and lead technical bridges.
- Deliver high‑quality RCAs with actionable follow‑ups.
- Root‑Cause Analysis (RCA) & Continuous Improvement
- Drive deep, data‑driven RCAs and long-term reliability improvements.
- Identify and eliminate systemic issues and operational toil.
- Leadership, Collaboration & Mentorship
- Provide technical leadership across teams.
- Mentor engineers and promote SRE best practices.
- Foster strong cross‑functional partnerships.
What You'll Need to be Successful:
- 7+ years in SRE, DevOps, Platform Engineering, or Cloud Engineering.
- Strong expertise in Azure and AWS.
- Proficiency in CI/CD, automation, and release engineering.
- Deep monitoring, logging, and observability experience.
- Incident response leadership experience.
- Proven RCA experience.
- Strong Linux skills.
- Scripting skills (Python, Bash, PowerShell, Go).
- IaC experience.
- Strong systems and networking fundamentals.
- Additional Preferred Qualifications
- Experience with large-scale distributed systems.
- Message queues or event streaming knowledge.
- Familiarity with incident management frameworks.
- Multi-cloud enterprise experience.
- Kubernetes, ECS, AKS, or EKS exposure
Why First Advantage is Your Next Big Career Move
First Advantage is going through a technology transformation! We are looking for experts who are excited to work with advanced technologies and provide best-in-class user experiences, drive the development and deployment of scalable solutions, and smoothly guide our agile teams and clients through meaningful changes as we continue to expand our impact.
What Are You Waiting For? Apply Today!
You have learned a little about us today – we want to learn about you! If you think this position and our company are a great fit for your areas of interest and expertise, tell us about you by applying now!
The salary range for this position is approximately $120,000 - $150,000 base annually. This range reflects our good faith estimate to pay fairly as to what our ideal candidates are likely to expect, and we tailor our offers within the range based on the selected candidate’s experience, industry knowledge, technical and communication skills, and other factors that may prove relevant during the interview process.
Top Skills
AWS
Azure
Azure Devops
Bash
Bicep
CloudFormation
Github Actions
Go
Jenkins
Powershell
Python
Terraform
Similar Jobs
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
The Site Reliability Engineer will ensure system reliability and performance, automate processes, and collaborate with dev teams, focusing on AWS infrastructure and incident management.
Top Skills:
AWSAws CloudformationCdkCloudwatchDynatraceGitGitlabLinuxPowershellPythonTerraform
Cloud • Security • Software • Cybersecurity
The Senior Site Reliability Engineer will enhance performance and reliability of distributed systems, define KPIs, and collaborate cross-functionally to improve infrastructure and operational efficiency.
Top Skills:
AdbmsBashDatadogGrafanaInternet ProtocolsJavaScriptOracle SqlPrometheusPython
Cloud • Security • Software • Cybersecurity
The Senior Site Reliability Engineer will manage scalable systems on the ZTNA Cloud Platform, automate operations, optimize performance, and work with multiple teams to enhance security products.
Top Skills:
ApacheArgocdAWSCeleryElasticsearchHelmJenkinsKubernetesLinuxNginxOpensearchPostgresRabbitMQTerraformUbuntu
What you need to know about the Seattle Tech Scene
Home to tech titans like Microsoft and Amazon, Seattle punches far above its weight in innovation. But its surrounding mountains, sprinkled with world-famous hiking trails and climbing routes, make the city a destination for outdoorsy types as well. Established as a logging town before shifting to shipbuilding and logistics, the Emerald City is now known for its contributions to aerospace, software, biotech and cloud computing. And its status as a thriving tech ecosystem is attracting out-of-town companies looking to establish new tech and engineering hubs.
Key Facts About Seattle Tech
- Number of Tech Workers: 287,000; 13% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Amazon, Microsoft, Meta, Google
- Key Industries: Artificial intelligence, cloud computing, software, biotechnology, game development
- Funding Landscape: $3.1 billion in venture capital funding in 2024 (Pitchbook)
- Notable Investors: Madrona, Fuse, Tola, Maveron
- Research Centers and Universities: University of Washington, Seattle University, Seattle Pacific University, Allen Institute for Brain Science, Bill & Melinda Gates Foundation, Seattle Children’s Research Institute


