System Engineer - Core Platform
Qualtrics, the leader in customer experience and creator of the Experience Management (XM) category, is changing the way organizations manage and improve the four core experiences of business––customer, employee, product, and brand. Over 12,000 organizations around the world are using Qualtrics to listen, understand, and take action on experience data (X-data™)––the beliefs, emotions, and intentions that tell you why things are happening, and what to do about it.
The Qualtrics XM Platform™ is a system of action that helps businesses attract customers who stay longer and buy more, engage employees who build a positive culture, develop breakthrough products people love, and build a brand people are passionate about. Join us as we help change the way people experience the world! Advance your career at a company that is dedicated to your ideas and growth, fills you with purpose, and provides a fun, inclusive work environment.
About the Core Platform Team
The Core Platform organization is responsible for building and supporting critical systems and services which are used by all the Qualtrics’ product line teams. Examples range from a centralized messaging platform with client libraries, to our A/B testing service, to our asynchronous job ecosystem, to our logging, metrics, and alerting infrastructure, to our storage systems.
This role is focused on the Database Engineering and Engineering Visibility subteams of Core Platform. The Database Engineering team supports our product teams with data storage solution choices, data modeling, performance, operations, and best practices. We support both Mysql and MongoDB, and are adding Redis support in 2020. Engineers on the team are not in traditional “DBA” roles--the perfect engineer for our team has a passion for storage in all forms, a zeal for efficient and robust tooling (build or buy), and the ability to communicate and work cross-team to accomplish engineering-wide initiatives. Our Engineering Visibility team is staffed in Europe and runs the logging, metrics, and monitoring solutions for Qualtrics--this role will help provide support for that team and a US-based point of contact.
Job Responsibilities:
- You will build systems to measure reliability of services and actively discover trends needing attention, including capacity planning
- You will fine tune services to reduce latency, conduct operational readiness reviews and automate continuous delivery of software changes
- You will enhance team runbooks and wikis to make everyone better
- Automation of vulnerability patching
- Building and maintaining tooling
- Improving data access policies and practices
- Application client driver upgrades, for example, with MySQL
- Investigating / recommending HA solutions for MySQL and Redis
- Data storage upgrade procedures--developing runbooks and teaching others
- Participation in on-call rotation including incident response and support
- Disaster Recovery strategies and tooling
- Understanding and support of the alerting, metrics, and logging systems
- Fedramp datacenter support
Qualifications:
- Bachelor's degree in CS preferred, or in a hard science or Information Systems
- 2+ years of software development or operations experience
- Experience with high-availability systems
- Excellent leadership, verbal, and written communication skills
- Demonstrated skill and passion for operational excellence
- US Citizenship or Green Card holder for FedRAMP requirements
Preferred Qualifications:
- Experience with AWS technologies, or other “devops” technologies including Docker, Jenkins, Puppet, Vault, orchestration frameworks such as K8s or Nomad
- Experience with shell scripts and/or other scripting languages like Python
- Experience with Unix/Linux platforms
- Experience with NOSQL technologies such as MongoDB, Cassandra, Redis etc
- Experience with Mysql and/or RDBMS data modeling and performance tuning
- Proficiency solving problems and identifying the root cause of issues
- Experience running and maintaining highly available distributed systems
- Capability to retain composure and communicate effectively during operational incidents
- Ability to understand large systems, drilling down to code level
- Ability to communicate effectively to different levels of technical and non-technical audiences