Director of Site Reliability Engineering

| Bellevue
About the Role:
The SRE Director takes direct ownership for the ongoing availability of DreamBox systems, from production, customer-facing interactive systems to the back-end systems that power development and delivery. To be effective, you’ll need to be committed to thorough preparation, documentation and testing, and just as ready to provide precise, consistent execution. An ideal candidate for this job is a perfectionist and a completeness fanatic: someone never comfortable with “good enough.” To be an effective director of the team, you’ll need a clear and consistent vision of what Site Reliability means, and how to take the steps necessary to establish and maintain that vision. Beyond that, you’ll need to map out how your vision will evolve and grow to anticipate the changing needs of the company.
What You'll be Doing:
  • Provide direct management to the Site Reliability Engineering (SRE) Team.
  • Perform all staffing functions, direct and indirect, including hiring, mentoring, training, performance appraisals and retention.
  • Research and recommend usage of current software development technologies and methodologies, bringing best practices to the entire SRE team.
  • Actively cultivate a learning, high performance environment to grow the development capabilities of the team.
  • Participate in departmental planning and budgeting functions.
  • Actively participate in business analysis and functional specifications.
  • Manage outsourced projects as needed.
  • Communicate cross functionally with all project stakeholders to ensure alignment.
  • Work and collaborate with a team of engineers and other staff with diverse skill sets.
  • Provide leadership and vision in the architecture and implementation of the next generation of our service platform.
  • Automate manual processes, enabling other engineering teams to self-service their application delivery needs.
  • Operate at all levels (leadership, management, and individual contribution), inspiring confidence and enthusiasm; happily jumping in and rolling up their sleeves when needed.
  • Research, evaluate and work with cutting-edge technologies that are defining the future of the cloud.
  • Help us shape a DevOps culture and drive deeper and broader adoption of DevOps principles.
  • Identify correct patterns and participants for on-call support of critical systems.
  • Coach and provide mentorship to the SRE managers and leads.
  • Be an active member of an architecture group focused on the big picture of how our software is built, deployed, and operated.
  • Ensure that DreamBox production systems meet or exceed all SLAs.
  • Ensure that DreamBox production systems can maintain SLA availability measures for the foreseeable future, allowing for planned or even likely change and growth.
  • Work with other teams to ensure that non-production systems also meet or exceed SLAs. This will entail technical work (like implementing redundancy or improving test automation) as well as non-technical work like training, listening, and coaching.
  • Help systems owners and multiple levels of management to understand the costs of availability decisions, in terms dollars but also in flexibility, transparency and management overhead.
About You:
  • 10+ years in Operations, DevOps, SRE or similar discipline in a high-availability environment.
  • 5+ years in leadership roles in Operations, DevOps, SRE or similar discipline.
  • Outstanding interpersonal and communication skills.
  • Excellent leadership skills.
  • Robust problem-solving skills.
  • Able to clearly articulate the relationship between platform, operations, security, and development aspects of a complex system.
  • Basic understanding of container orchestration, and containers in general. Able to discuss strengths and weaknesses of various container technologies.
  • Familiarity with AWS Cloud configuration, particularly monitoring and security components.
  • Expert level knowledge of Linux administration.
  • Thorough understanding of network security concerns, including tcp/ip addressing and routing, authentication, and authorization.
Read Full Job Description

Technology we use

  • Engineering
    • JavaLanguages
    • JavascriptLanguages
    • PythonLanguages
    • RLanguages
    • RubyLanguages
    • SqlLanguages
    • jQuery UILibraries
    • ReactLibraries
    • AngularJSFrameworks
    • Backbone.jsFrameworks
    • Node.jsFrameworks
    • Ruby on RailsFrameworks
    • SpringFrameworks
    • MySQLDatabases


Located in the growing tech scene of DT Bellevue, employees get the best of both worlds with access to metropolitan amenities & PNW scenery.

What are DreamBox Learning Perks + Benefits

Health Insurance & Wellness Benefits
Flexible Spending Account (FSA)
Disability Insurance
Dental Benefits
Vision Benefits
Health Insurance Benefits
Life Insurance
Wellness Programs
Retirement & Stock Options Benefits
Company Equity
Child Care & Parental Leave Benefits
Flexible Work Schedule
Vacation & Time Off Benefits
Generous PTO
Paid Volunteer Time
Paid Holidays
Paid Sick Days
Perks & Discounts
Casual Dress
Commuter Benefits
Company Outings
Game Room
Happy Hours
Pet Friendly
Recreational Clubs
Professional Development Benefits
Job Training & Conferences
Diversity Program

Additional Perks + Benefits

Last but not least, we have a noble mission of changing the way the world learns that has a strong purpose and leads to a different way of waking up in the morning.

More Jobs at DreamBox Learning13 open jobs
All Jobs
Design + UX
Dev + Engineer
Project Mgmt
Project Mgmt
Project Mgmt
Design + UX