Director of Site Reliability Engineering (Bellevue, WA)

Sorry, this job was removed at 11:46 a.m. (PST) on Thursday, January 31, 2019
Find out who's hiring in Bellevue.
See all Developer + Engineer jobs in Bellevue
Apply
By clicking Apply Now you agree to share your profile information with the hiring company.

About the Role:
The SRE Director takes direct ownership for the ongoing availability of DreamBox systems, from production, customer-facing interactive systems to the back-end systems that power development and delivery. To be effective, you’ll need to be committed to thorough preparation, documentation and testing, and just as ready to provide precise, consistent execution. An ideal candidate for this job is a perfectionist and a completeness fanatic: someone never comfortable with “good enough.” To be an effective director of the team, you’ll need a clear and consistent vision of what Site Reliability means, and how to take the steps necessary to establish and maintain that vision. Beyond that, you’ll need to map out how your vision will evolve and grow to anticipate the changing needs of the company.

 

What You'll be Doing:

  • Provide direct management to the Site Reliability Engineering (SRE) Team.
  • Perform all staffing functions, direct and indirect, including hiring, mentoring, training, performance appraisals and retention.
  • Research and recommend usage of current software development technologies and methodologies, bringing best practices to the entire SRE team.
  • Actively cultivate a learning, high performance environment to grow the development capabilities of the team.
  • Participate in departmental planning and budgeting functions.
  • Actively participate in business analysis and functional specifications.
  • Manage outsourced projects as needed.
  • Communicate cross functionally with all project stakeholders to ensure alignment.
  • Work and collaborate with a team of engineers and other staff with diverse skill sets.
  • Provide leadership and vision in the architecture and implementation of the next generation of our service platform.
  • Automate manual processes, enabling other engineering teams to self-service their application delivery needs.
  • Operate at all levels (leadership, management, and individual contribution), inspiring confidence and enthusiasm; happily jumping in and rolling up their sleeves when needed.
  • Research, evaluate and work with cutting-edge technologies that are defining the future of the cloud.
  • Help us shape a DevOps culture and drive deeper and broader adoption of DevOps principles.
  • Identify correct patterns and participants for on-call support of critical systems.
  • Coach and provide mentorship to the SRE managers and leads.
  • Be an active member of an architecture group focused on the big picture of how our software is built, deployed, and operated.
  • Ensure that DreamBox production systems meet or exceed all SLAs.
  • Ensure that DreamBox production systems can maintain SLA availability measures for the foreseeable future, allowing for planned or even likely change and growth.
  • Work with other teams to ensure that non-production systems also meet or exceed SLAs. This will entail technical work (like implementing redundancy or improving test automation) as well as non-technical work like training, listening, and coaching.
  • Help systems owners and multiple levels of management to understand the costs of availability decisions, in terms dollars but also in flexibility, transparency and management overhead.
About You:
  • 10+ years in Operations, DevOps, SRE or similar discipline in a high-availability environment.
  • 5+ years in leadership roles in Operations, DevOps, SRE or similar discipline.
  • Outstanding interpersonal and communication skills.
  • Excellent leadership skills.
  • Robust problem-solving skills.
  • Able to clearly articulate the relationship between platform, operations, security, and development aspects of a complex system.
  • Basic understanding of container orchestration, and containers in general. Able to discuss strengths and weaknesses of various container technologies.
  • Familiarity with AWS Cloud configuration, particularly monitoring and security components.
  • Expert level knowledge of Linux administration.
  • Thorough understanding of network security concerns, including tcp/ip addressing and routing, authentication, and authorization.
Read Full Job Description
Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.

Location

Located in the growing tech scene of DT Bellevue, employees get the best of both worlds with access to metropolitan amenities & PNW scenery.

Similar Jobs

Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.
Learn more about DreamBox Learning (A Discovery Education Company)Find similar jobs