Site Reliability Engineer
The Developer Tooling team at Outreach is responsible for the foundation on which all the other software that Outreach engineering teams build runs. That means we need to be empathetic to the needs of our co-workers in the performance of their jobs. It also means that we must be pretty focused on how our systems are performing according to our SLOs and SLIs. We have spent the last year transitioning much of our production infrastructure to run on top of kubernetes. We are looking for someone to come help us mature that new platform, and finish transitioning the long tail of legacy systems to the new one. We also need someone to help us reshape other portions of our underlying production infrastructure as we continue to rapidly grow and scale. Outreach has grown enormously each of the last several years; we don't see any signs of stopping soon. That means we need someone to help us identify the constraints in our system, and prioritize which ones we address. We are looking for someone that can be analytically minded. In addition, the right person isn't necessarily interested in building new and exciting infrastructure technology. Instead, you are focused on using and augmenting existing tools to serve our needs.
About the Team
The platform team is composed of folks with disparate skills and backgrounds. Our unifying attribute is our desire to work together to find creative, scalable solutions to the problems we run into. We are currently a fairly senior-heavy team, so we are open to finding the right person regardless of their experience level. Beyond the basic demands of managing our production infrastructure, the Platform Team is also responsible for supporting CI/CD, monitoring and alerting systems, and compliance initiatives. We have a diverse set of obligations to the rest of the Outreach organization, and that is reflected in the different types of work in which we get to indulge.
Your Daily Adventures Will Include
Our Site Reliability Engineers are usually iterating on our planned projects on a day to day basis. However, we are occasionally disrupted by exigent circumstances (read: alerts). The aim is to ensure that we spend more time than not working on software to make our platform more performant and scalable, and make it easier for the other software engineers to do their jobs. We are also occasionally called to assist other teams. When confronted with disruptive events, we strive to codify what we’ve learned and feed that information back into how we plan and prioritize our work.
Have you ever configured a linux server to run a service? Did you enjoy it? If yes to both questions, then it's likely that you have some skills relevant to this position. We would also very much like someone who believes heavily in automating away problems, is heavily invested in continuing to learn and grow both as a human and in their career, and has strong verbal and written communication skills.
Tech stack: In addition to Kubernetes, we use jsonnet, Chef, Concourse, Elasticsearch, Terraform, ruby, and Go. It's awesome if you have experience with any of those things, but we are happy to help you learn. Other things that you may have experience in that are potentially relevant, but not required: building highly available services, an understanding of distributed systems and their commonly associated problems, MySQL administration, cloud computing fundamentals (preferably in AWS), REST, cloud-based networking, Unix fundamentals, performance profiling (especially in ruby). We encourage you to apply, even if you think the position sounds a bit outside your wheelhouse. We want to find folks who are interested in learning and developing on the job, even if you can walk in the door and do amazing things.
We are ideally looking for SREs with 3+ years dedicated experience.
Read Full Job Description