Sr. Site Reliability Engineer
At Smartsheet, we are building the next generation workspace collaboration platform. Our Technical Operations team is committed to operational excellence and delivering a world class customer experience. We're in an exciting high growth stage and now is the best time to join our team. Learn more about us with this short video overview of Smartsheet: Smartsheet Overview Video.
We are currently looking for a Senior Site Reliability Engineer to join our Site Reliability Engineering team. In this position, you will directly impact the reliability and performance of our critical production application systems; supporting 24/7 delivery to over 70,000 customers worldwide. We’re looking for a motivated individual to manage deployment of configuration management solutions, resolve production escalations, and iterate on improving both production and pre-production environments.
This position will report to the SRE Manager and is located at our Bellevue, WA headquarters.
- Participate in a follow-the-sun rotation providing 24x7 production support.
- Troubleshoot, investigate, and fix production issues in cloud and hosted environments, including system, and application performance issues.
- Automate all the things: software deployments, infrastructure configuration management, security patching, failover topologies.
- Manage customer support and development escalations; working directly with Sustaining Engineering.
- Manage work / track issues through ticketing systems and follow through to resolution.
- Ensure production changes are documented, fully tested in non-production environments, and adhere to change control and audit requirements.
- Participate and support multiple teams in incident management, PIR, deployment and change processes.
- Investigate security and compliance concerns, in accordance with company policies.
- 5+ years of work experience with production Linux systems administration.
- 5+ years of experience with at least one scripting language ( eg Bash, Python, Ruby, Go )
- 5 + years of experience developing, migrating and supporting production systems in AWS / GCP / Azure
- Highly motivated, critical thinker with proven ability to troubleshoot and solve problems in a production support environment.
- Ability to successfully manage competing priorities in critical incident situations.
- Proficient with basic internet protocols (eg HTTP, DNS, TCP/IP)
- Proficient with config management, source control and containerization tools.
- Excellent verbal and written communication skills.
- Ability to work in the U.S. on an ongoing basis.
- Bachelor’s degree in Computer Science or related discipline required.
About Smartsheet: In 2005, Smartsheet was founded on the idea that teams and millions of people worldwide deserve a better way to deliver their very best work. Today, the company delivers a leading cloud-based platform for work execution, empowering organizations to plan, capture, track, automate, and report on work at scale, resulting in more efficient processes and better business outcomes. Smartsheet went public on the New York Stock Exchange in April 2018 and currently enables collaboration, better decision making, and accelerated innovation for over 76,000 domain-based customers in 190 countries, including 96 of the Fortune 100.
Smartsheet is an Equal Opportunity Employer. Individuals seeking employment at Smartsheet are considered without regard to race, ethnicity, color, age, sex, religion, national origin, ancestry, pregnancy, sexual orientation, gender, gender identity, gender expression, genetic information, physical or mental disability, registered domestic partner status, caregiver status, marital status, veteran or military status, citizenship status, or any other legally protected category.