Site Reliability Engineer, Cloud Services
About the Company
Qumulo is the leading file data platform, providing unrivaled freedom, control, and real-time visibility for file data at petabyte scale. Fortune 500 companies, major film studios, and the largest research facilities in the world trust Qumulo to help them innovate with their mission-critical digital files. The Qumulo experience makes file data management simple with continuous new features and a single, easy to use solution in a customer’s data center or in the public cloud.
Qumulo has a collaborative culture with a strong focus on delivering value to our customers. We are looking for people with a diverse set of experiences to help us fulfill our vision. You can get to know a few of our engineers at https://qumulo.com/eng/ (https://qumulo.com/eng/).
About the Position
As a Qumulo Member of Technical Staff in our SRE team, you will build, maintain, and operate our managed service, which allows end users to provision performant and easy-to-use scale out storage at a click of a button. You will also develop, maintain, and operate our telemetry service which collects data from our worldwide fleet of clusters.
As a member of our SRE team, you will contribute to the research and implement groundbreaking tools and technologies to support our services. You will collaborate with many roles on our engineering and customer success teams to help us deliver an outstanding file data platform to our customers.
Responsibilities
As a member of our SRE team, you will work collaboratively with our development teams to build a managed service that is easy to operate and has very high uptimes. You will also optimize and operate our telemetry service which receives data from our worldwide fleet of clusters. You will operate and continuously improve our services’ reliability, scalability, performance, security, and uptime. Through evaluating new tools and technologies, you will help to implement those that better our service. We are very collaborative, have meticulous execution standards, and strive for continuous improvement.
Qualifications
- Previous experience running 24 x 7 production operation for customer facing services, including on call duties.
- Experience in systems automation and infrastructure as code, utilizing modern tooling for workflow automation and CI/CD.
- Ability to monitor systems utilizing industry standard tools
- Experience implementing container/container-fleet-orchestration technologies such as Kubernetes, ECS, or Docker.
- Can effectively use scripting languages such as Python / Ruby / Bash
- Confirmed experience in Linux system administration and troubleshooting.
- Proficiency in a cloud service such ase AWS, Azure, or GCP.
Key Benefits
- Excellent healthcare coverage
- Parental leave
- 401K investment plan
- Unlimited paid time off, strongly encouraged to take at least 3 weeks per year
Other Details
Qumulo is an Equal Opportunity Employer. Qualified applicants will receive consideration for employment without regard to race, color, gender, religion, sex, sexual orientation, age, disability, military status, or national origin or any other characteristic protected under federal, state, or applicable local law.
Please note that employment at Qumulo is contingent upon completion of a satisfactory background check.
For more information on our Applicant and Employee Privacy Notice please click on the link below:
https://qumulo.com/applicant-employee-privacy-notice