Sr. Site Reliability Engineer at Subsplash
Based in Seattle, Subsplash is an exciting award-winning team of 190+ mission-driven people who are committed to our core values of humility, innovation, and excellence. Founded in 2005, we’ve pioneered the market with the first ever church mobile app. Since then, we’ve been working together to build The Ultimate Engagement Platform™ for churches, Christian ministries, non-profits, and businesses around the world. We find excitement in serving our 14,000+ clients, creating impactful products, and delighting the 40 million real people who use our platform every day. Subsplash has won awards for best mobile experience, been voted top 100 Washington's Best Workplaces by the Puget Sound Business Journal, created some of the most downloaded apps of all time, and built enterprise software for world-class brands like XBOX, Microsoft, Samsung, Expedia, and Cisco; yet, at the end of the day, we love making a lasting impact and a difference in our world.
Working at Subsplash is more than just a job; we are a team of people who are courageous, inventive, and passionate about doing meaningful work every day. Don’t take our word for it—head to Glassdoor and see for yourself!
About the Team
The Engineering Team is responsible for building and running all the products that Subsplash offers. We are a super-star team of software engineering QA, and reliability professionals creating polished experiences for our clients and end-users. The Engineering Team is responsible for the entire user experience including: Mobile Apps (end-user facing), Subsplash Giving, the Subsplash Dashboard CMS (client-facing), Web App, TV App, backend data feeds, analytics, SnapPages (website builder) and more. We serve thousands of clients, millions of end-users, and billions of individual app impressions.
About the Role
As a Senior Site Reliability Engineer (Sr. SRE), you will report to the Manager of Site Reliability Engineering. In this role, you will be responsible for maintaining, scaling, monitoring and enhancing the automation of our systems infrastructure for both production and development operations. You will contribute to the creative process by building elegant solutions to complex problems. You will work well with other members of the team, in a fast-paced environment, to help deliver working software early and often. You will be someone that your peers look up to for guidance and mentorship and you will champion industry best practices across the team.
Top 3 Key Outcomes in Year 1
- Demonstrate command and ownership over all of Subsplash’s infrastructure management.
- Work with the rest of the team to unify the approaches to logging, monitoring, automation, incident response, performance optimization, and security.
- Implement a container orchestration framework in the cloud from the ground up (i.e. Kubernetes).
- Develop tools and systems to automate the deployment, management, scaling, monitoring, alerting, incident response and issue resolution of our operational systems
- Evangelize, implement and champion best practices for security across the entire stack
- Delivering an operational uptime of 99.9% and above, across all services
- Be seen as an influencer on the Site Reliability/Software Engineering teams, implementing robust environments, efficiently responding to and solving incidents and issues, delegating effectively and assisting in the professional development of those around you
- General cloud server/database management functions
- Diagnose, resolve, and prepare written analysis for database and web server issues (RCAs)
- Auto-scaling/provisioning of server resources
- Maintain good internal documentation of the big picture and the specifics regarding our operational systems and processes
- During scheduled on call periods, respond, diagnose, and resolve issues in a timely fashion
- Accomplish assigned features, changes and bug fixes
- Invest in other team members by performing reviews, facilitating technical design discussions, and championing industry best practices
- Manage and improve database replication configurations
- Performance tuning of databases and web servers
- Implement an orchestration framework for containerization of our micro-services
- 7-10+ years of full-time site reliability work
- Embraces the Agile process (knowledge of Continuous Integration, Continuous Delivery, Continuous Deployment, feature-driven, and test-driven development processes, Kanban, LEAN, and SOLID a plus)
- Experience protecting against security risks such as XSS, SQL Injection, appropriate use of SSL, session hijacking etc.
- Good understanding of TCP/IP networking and core technologies (DNS, LDAP, NFS, FTP, etc.)
- Experience with managing large volume and high throughput database systems
- Experience with managing databases in large-scale distributed Linux environments
- Experience working inside a Level 1 PCI CDE
- Experience mentoring other site reliability and/or software engineering professionals
- Bachelor of Science in Computer Science or equivalent work experience
- Desire to serve faith-based organizations a plus
Knowledge And Skills
- Linux system administration, monitoring, hardware failure diagnosis/root cause analysis, firmware updates, software package and repository management, with strong system troubleshooting skills
- Ability to read and write maintainable code in one of the following languages: Go, C/C++, PHP, Java
- Extremely comfortable with server-side web technologies: Unix/Linux (Ubuntu), Apache/Nginx, MySQL, MongoDB a plus, Go, PHP (Lithium framework a plus)
- Scripting experience in Go, bash, Perl, and/or Python
- Working knowledge of infrastructure automation tools such as Terraform, CloudFormation, Ansible, Chef or Puppet
- Experience with building, maintaining, and supporting server environments with dedicated hosting providers (i.e. Amazon AWS, Google Cloud Platform, Rackspace, Azure)
- Competent with source control management (i.e. git)
- Ability to collaborate well with a team of developers, designers, project managers, and testers
- Able to prioritize multiple projects, tasks, and bug fixes with good communication skills
- Ability to diagram and document solutions using Confluence/Google Drive
- Knowledge of system performance methodologies along with hands-on empirical monitoring
- Skilled at operating at all levels of the system stack – from hardware through kernel and up to the operating system and network
- Knowledge of data security best practices related to PCI/DSS/CISP/HIPAA
- An added plus: knowledge of jQuery, HTML5, CSS3, AJAX, JSON, and cross browser compatibility
Generous Paid Time Off, Medical Coverage, Dental Coverage, Vision Coverage, 401k, Free Smoothies and Snacks, Optional Work-from-home Thursday’s, and a Public Transportation Subsidy.
This position is classified as Full-time/Exempt and therefore not eligible for overtime pay.
Note: Employment with Subsplash is contingent upon satisfactory proof of employee’s right to work in the U.S., as required by law and upon completion of a basic background check and;
Employment with Subsplash is considered “at will,” meaning that either the company or the employee may terminate the employment relationship at any time without cause or notice.