Sr. Site Reliability Engineer

Subsplash

| Remote

Sorry, this job was removed at 11:12 a.m. (PST) on Wednesday, January 20, 2021

View 898 Jobs

Find out who’s hiring remotely

See all Remote jobs

View 898 Jobs

Apply

By clicking Apply Now you agree to share your profile information with the hiring company.

Save job

Senior Site Reliability Engineer

About Subsplash

Based in Seattle, Subsplash is an exciting award-winning team of 190+ mission-driven people who are committed to our core values of humility, innovation, and excellence. Founded in 2005, we’ve pioneered the market with the first ever church mobile app. Since then, we’ve been working together to build The Ultimate Engagement Platform™ for churches, Christian ministries, non-profits, and businesses around the world. We find excitement in serving our 14,000+ clients, creating impactful products, and delighting the 40 million real people who use our platform every day. Subsplash has won awards for best mobile experience, been voted top 100 Washington's Best Workplaces by the Puget Sound Business Journal, created some of the most downloaded apps of all time, and built enterprise software for world-class brands like XBOX, Microsoft, Samsung, Expedia, and Cisco; yet, at the end of the day, we love making a lasting impact and a difference in our world.

Working at Subsplash is more than just a job; we are a team of people who are courageous, inventive, and passionate about doing meaningful work every day. Don’t take our word for it—head to Glassdoor and see for yourself!

About the Team

The Engineering Team is responsible for building and running all the products that Subsplash offers. We are a super-star team of software engineering QA, and reliability professionals creating polished experiences for our clients and end-users. The Engineering Team is responsible for the entire user experience including: Mobile Apps (end-user facing), Subsplash Giving, the Subsplash Dashboard CMS (client-facing), Web App, TV App, backend data feeds, analytics, SnapPages (website builder) and more. We serve thousands of clients, millions of end-users, and billions of individual app impressions.

About the Role

As a Senior Site Reliability Engineer (Sr. SRE), you will report to the Manager of Site Reliability Engineering. In this role, you will be responsible for maintaining, scaling, monitoring and enhancing the automation of our systems infrastructure for both production and development operations. You will contribute to the creative process by building elegant solutions to complex problems. You will work well with other members of the team, in a fast-paced environment, to help deliver working software early and often. You will be someone that your peers look up to for guidance and mentorship and you will champion industry best practices across the team.

Top 3 Key Outcomes in Year 1

Demonstrate command and ownership over all of Subsplash’s infrastructure management.
Work with the rest of the team to unify the approaches to logging, monitoring, automation, incident response, performance optimization, and security.
Implement a container orchestration framework in the cloud from the ground up (i.e. Kubernetes).

Your Priorities

Develop tools and systems to automate the deployment, management, scaling, monitoring, alerting, incident response and issue resolution of our operational systems
Evangelize, implement and champion best practices for security across the entire stack
Delivering an operational uptime of 99.9% and above, across all services
Be seen as an influencer on the Site Reliability/Software Engineering teams, implementing robust environments, efficiently responding to and solving incidents and issues, delegating effectively and assisting in the professional development of those around you
General cloud server/database management functions
Diagnose, resolve, and prepare written analysis for database and web server issues (RCAs)
Auto-scaling/provisioning of server resources
Maintain good internal documentation of the big picture and the specifics regarding our operational systems and processes
During scheduled on call periods, respond, diagnose, and resolve issues in a timely fashion
Accomplish assigned features, changes and bug fixes
Invest in other team members by performing reviews, facilitating technical design discussions, and championing industry best practices
Manage and improve database replication configurations
Performance tuning of databases and web servers
Implement an orchestration framework for containerization of our micro-services

Qualifications

7-10+ years of full-time site reliability work
Embraces the Agile process (knowledge of Continuous Integration, Continuous Delivery, Continuous Deployment, feature-driven, and test-driven development processes, Kanban, LEAN, and SOLID a plus)
Experience protecting against security risks such as XSS, SQL Injection, appropriate use of SSL, session hijacking etc.
Good understanding of TCP/IP networking and core technologies (DNS, LDAP, NFS, FTP, etc.)
Experience with managing large volume and high throughput database systems
Experience with managing databases in large-scale distributed Linux environments
Experience working inside a Level 1 PCI CDE
Experience mentoring other site reliability and/or software engineering professionals
Bachelor of Science in Computer Science or equivalent work experience
Desire to serve faith-based organizations a plus

Knowledge And Skills

Linux system administration, monitoring, hardware failure diagnosis/root cause analysis, firmware updates, software package and repository management, with strong system troubleshooting skills
Ability to read and write maintainable code in one of the following languages: Go, C/C++, PHP, Java
Extremely comfortable with server-side web technologies: Unix/Linux (Ubuntu), Apache/Nginx, MySQL, MongoDB a plus, Go, PHP (Lithium framework a plus)
Scripting experience in Go, bash, Perl, and/or Python
Working knowledge of infrastructure automation tools such as Terraform, CloudFormation, Ansible, Chef or Puppet
Experience with building, maintaining, and supporting server environments with dedicated hosting providers (i.e. Amazon AWS, Google Cloud Platform, Rackspace, Azure)
Competent with source control management (i.e. git)
Ability to collaborate well with a team of developers, designers, project managers, and testers
Able to prioritize multiple projects, tasks, and bug fixes with good communication skills
Ability to diagram and document solutions using Confluence/Google Drive
Knowledge of system performance methodologies along with hands-on empirical monitoring
Skilled at operating at all levels of the system stack – from hardware through kernel and up to the operating system and network
Knowledge of data security best practices related to PCI/DSS/CISP/HIPAA
An added plus: knowledge of jQuery, HTML5, CSS3, AJAX, JSON, and cross browser compatibility

Benefits

Generous Paid Time Off, Medical Coverage, Dental Coverage, Vision Coverage, 401k, Free Smoothies and Snacks, Optional Work-from-home Thursday’s, and a Public Transportation Subsidy.

This position is classified as Full-time/Exempt and therefore not eligible for overtime pay.

Note: Employment with Subsplash is contingent upon satisfactory proof of employee’s right to work in the U.S., as required by law and upon completion of a basic background check and;

Employment with Subsplash is considered “at will,” meaning that either the company or the employee may terminate the employment relationship at any time without cause or notice.

Read Full Job Description

Sr. Site Reliability Engineer

Location

Similar Jobs