Site Reliability Engineer
We’re looking for a Site Reliability Engineer to join Snap Inc! As a member of the Infrastructure Team, you will help design and operate the next generation of Snap’s multi-cloud architecture. Working from our Seattle, WA office, you’ll collaborate across teams to engineer ways to improve Snapchat's reliability and scalability. You’ll build operational tools and deliver automation that will be used by SRE, as well as the rest of Snap engineering. You’ll participate in operations along with engineering team on-calls, helping to debug, improve, and optimize critical backend services. In addition to improving Snap’s services, this is also an opportunity to contribute to the overall culture and strategies around service operations and reliability here at Snap (incident response, post-mortems, trend analysis, availability standards).
What you’ll do:
- Design, operate, and improve our most critical services
- Work across teams to understand system requirements, evaluate trade-offs, and deliver the solutions needed to build reliable services
- Identify scaling bottlenecks and help Snap services scale to meet user demand
- Help make our team better by contributing to design and launch reviews for new services
- Advocate for and apply best practices when it comes to availability, scalability, operational excellence, and efficiency
Minimum qualifications:
- BS/BA in a technical field such as Computer Science or equivalent experience
- 3+ years of software development experience
- Experience with backend services or distributed systems
Preferred qualifications:
- Experience or proficiency in one of Java / Go / C++
- Experience operating large-scale distributed systems, microservice architectures, or multi-tenant systems
- Hands-on experience using AWS or Google Cloud services
- Experience with NoSQL storage solutions and Memcache/Redis
- Experience with Kubernetes, Envoy, and related software a plus
- Passionate about problem-solving with strong technical communication skills and desire to collaborate with others
- Interest in operational excellence, availability, and automating away manual tasks