Chess.com Logo

Chess.com

Engineering Lead, Systems Operations

Posted 11 Days Ago
Be an Early Applicant
Remote
Hiring Remotely in USA
Senior level
Remote
Hiring Remotely in USA
Senior level
Lead a team of operations engineers, define SysOps strategies, manage cloud migration, oversee monitoring systems, and ensure system reliability while promoting a continuous learning culture.
The summary above was generated by AI

About Us

Chess.com is one of the largest gaming sites in the world and the #1 platform for playing, learning, and enjoying chess.


We are a team of 600+ fully remote people in 60+ countries working hard to serve the global chess community. We are here to support 200M+ chess players worldwide with the best possible product, content, and tools to serve the community!


We are a tech company. A gaming company. A content company. And we do it all with passion and commitment to the game. Above all we prize our mission-driven, flat, life-celebrating, no-corporate culture, and we look forward to meeting you and learning more about what you can bring to the team.

About You

You are passionate about building and managing infrastructure. It brings you joy to learn new technologies and use them to help reach challenging product goals. You have solid experience deep diving into Linux internals, as well as the future-oriented skills of managing Cloud/Kubernetes ecosystems. You are humble with a sense of humor and eager to be a part of a like minded team of people.  You have been working in or dreamed of working in the gaming industry and are ready to turn your talents towards chess!


What You’ll do

  • Lead and mentor a team of 5-8 system operations engineers, providing technical guidance, career development, and performance management while demonstrating adaptive leadership styles and fostering a teachable culture of continuous learning
  • Define and execute the multi-year SysOps strategy with clear prioritization of critical initiatives, including multi-regional infrastructure architecture capable of handling millions of concurrent sessions across global data centers
  • Own the hybrid cloud migration roadmap, partnering with leadership to integrate bare-metal datacenter resources with cloud services for optimal performance and cost efficiency, delivering value through time-to-market optimization
  • Establish on-call rotation policies and incident response procedures with strong focus on work-life balance, ensuring rapid resolution of critical system issues while maintaining team health and high availability SLAs
  • Drive the implementation of monitoring, observability, and alerting systems that reach the right people at the right time, proactively identifying and resolving performance bottlenecks before they impact users and preventing organizational surprises
  • Partner with engineering leadership to implement infrastructure-as-code practices and establish deployment pipelines that support continuous integration and delivery, emphasizing quality with high first-time-right rates and low rework
  • Oversee capacity planning, load testing, and resource allocation strategies across distributed computing environments, demonstrating excellent time management and execution velocity while managing infrastructure budget and cost optimization
  • Champion security protocols and risk assessment procedures for infrastructure components and data protection with unwavering integrity, ensuring compliance with industry standards and earning trust across the organization
  • Collaborate with product and engineering leaders to design scalable solutions for high-traffic applications, valuing others' time by simplifying cross-team workflows and ensuring clear presentation of technical concepts to varied audiences
  • Lead automation initiatives that deliver measurable value to both internal and external customers, reducing manual operational overhead and improving system reliability through scripting and configuration management
  • Build authentic relationships with cross-functional teams and stakeholders, ensuring transparent communication of system health and aligning SysOps priorities with business objectives through excellent listening and presentation skills
  • Recruit, retain, and develop top engineering talent by understanding individual motivations and aligning team goals with personal drivers, fostering an inclusive culture where growth mindset principles guide decision-making and risk-taking
  • Demonstrate focus on commitments by managing distractions effectively, maintaining a strong track record of successful execution, and accumulating wins that build credibility and trust across the organization

Required Qualifications

  • 5+ years of experience in system operations, DevOps, or infrastructure engineering roles with demonstrated excellence in execution and velocity
  • 2+ years of experience managing technical teams, including hiring, performance management, and career development with proven ability to identify and adapt leadership styles
  • Strong proficiency with UNIX/Linux operating systems and command-line administration
  • Deep experience with cloud platforms (GCP, AWS, or Azure) and infrastructure-as-code tools (Terraform, CloudFormation, or similar)
  • Hands-on experience with configuration management systems (Ansible, Chef, Puppet, or similar)
  • Solid understanding of networking fundamentals, protocols (TCP/IP, HTTP/HTTPS, DNS), and network troubleshooting
  • Experience with containerization and orchestration technologies (Docker, Kubernetes, or similar)
  • Proficiency with monitoring and observability tools (Datadog, Prometheus, Grafana, ELK stack, or similar)
  • Experience with relational and NoSQL databases, including performance optimization and scaling strategies
  • Excellent communication skills with proven ability to reach the right stakeholders, present complex technical concepts clearly, and listen effectively to understand diverse perspectives
  • Strong prioritization and time management skills, with ability to distinguish critical work from nice-to-have initiatives
  • Demonstrated integrity in decision-making, earning respect and trust from peers, direct reports, and senior leadership
  • Proven track record of building and scaling reliable systems and high-performing teams with high-quality outcomes and low maintenance costs
  • Growth mindset with ability to share ideals and risks positively, avoid fixed mindset behaviors, and remain teachable in all situations
  • Ability to understand what motivates individuals and teams, aligning work with intrinsic drivers to maximize engagement

Preferred Skills

  • Experience managing bare-metal server infrastructure and datacenter operations at scale
  • Strong background in server-side automation and scripting languages (Python, Go, Bash, or similar)
  • Experience designing high-availability architectures and disaster recovery strategies with focus on delivering customer value
  • Experience with game server infrastructure or real-time application hosting at scale
  • Knowledge of database administration and optimization for high-concurrency applications
  • Experience building and optimizing CI/CD pipelines and deployment automation that balance velocity with quality
  • Proven success with capacity planning, performance testing, and infrastructure cost optimization
  • Experience managing remote, distributed teams across multiple time zones while valuing team members' time and work-life balance
  • Track record of fostering inclusive team cultures, developing engineering talent, and mentoring others on leadership approaches
  • Demonstrated commitment to continuous learning with awareness of when to teach and when to learn from others
  • History of making technical decisions without compromising personal or organizational values
  • Ability to simplify complex infrastructure challenges and make them easier for other teams to understand and engage with
  • Continuous learning mindset with interest in emerging infrastructure technologies and willingness to share knowledge across the organization
  • Strong collaboration and communication skills working in a fully distributed team
  • Sense of ownership and responsibility

About the Opportunity

  • This is a full-time position
  • We are 100% remote (always have been, always will be!)



You can learn more about us here:

  • https://www.chess.com/article/view/how-chess-com-virtual-team-works-together
  • https://www.chess.com/about

We look forward to meeting you!


Top Skills

Ansible
AWS
Azure
Bash
Chef
Cloud
CloudFormation
Docker
Elk
GCP
Go
Grafana
Kubernetes
Linux
Prometheus
Puppet
Python
Terraform

Similar Jobs

An Hour Ago
Remote or Hybrid
USA
70K-110K Annually
Mid level
70K-110K Annually
Mid level
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
As a Technical Account Manager, you will onboard customers, drive technical support, engage with clients, and lead project implementations while advocating for customer success.
Top Skills: ItilLinuxmacOSPmpWindows Server
An Hour Ago
Remote or Hybrid
2 Locations
130K-200K Annually
Senior level
130K-200K Annually
Senior level
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
The Senior Manager of Sales Operations will drive growth and operational strategy for a $1B+ business unit, working closely with sales leadership to optimize processes and forecasting.
Top Skills: ClariExcelGoogle SuiteSalesforceTableau
An Hour Ago
Remote or Hybrid
8 Locations
85K-128K Annually
Senior level
85K-128K Annually
Senior level
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
The Regional Sales Manager will develop account strategies, close new business, maintain relationships with key decision makers, and collaborate with various teams to drive sales in the assigned region.
Top Skills: CloudCybersecuritySaaSSalesforce

What you need to know about the Seattle Tech Scene

Home to tech titans like Microsoft and Amazon, Seattle punches far above its weight in innovation. But its surrounding mountains, sprinkled with world-famous hiking trails and climbing routes, make the city a destination for outdoorsy types as well. Established as a logging town before shifting to shipbuilding and logistics, the Emerald City is now known for its contributions to aerospace, software, biotech and cloud computing. And its status as a thriving tech ecosystem is attracting out-of-town companies looking to establish new tech and engineering hubs.

Key Facts About Seattle Tech

  • Number of Tech Workers: 287,000; 13% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Amazon, Microsoft, Meta, Google
  • Key Industries: Artificial intelligence, cloud computing, software, biotechnology, game development
  • Funding Landscape: $3.1 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Madrona, Fuse, Tola, Maveron
  • Research Centers and Universities: University of Washington, Seattle University, Seattle Pacific University, Allen Institute for Brain Science, Bill & Melinda Gates Foundation, Seattle Children’s Research Institute

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account