System Engineer - Core Platform
The Qualtrics XM Platform™ is a system of action that helps businesses to attract customers who stay longer and buy more, to engage and empower employees to do the best work of their lives, to develop breakthrough products people love, and to build a brand people can’t imagine living without.
Joining Qualtrics means becoming part of a team bold enough to chase breakthrough experiences - like building a technology that will be a force for good. A team committed to diversity, equity, and inclusion because of a conviction that every voice holds value, with a vision for representation that matches the world around us and inclusion that far exceeds it. You could belong to a team whose values center on transparency, being all in, having customer obsession, acting as one team, and operating with scrappiness. All so you can do the best work of your career.
We believe every interaction is an opportunity. Are we yours?
About the Core Platform Team
The Core Platform organization is responsible for building and supporting critical systems and services which are used by all the Qualtrics’ product line teams. Examples range from a centralized messaging platform with client libraries, to our A/B testing service, to our asynchronous job ecosystem, to our logging, metrics, and alerting infrastructure, to our storage systems.
This role is focused on the Database Engineering and Engineering Visibility subteams of Core Platform. The Database Engineering team supports our product teams with data storage solution choices, data modeling, performance, operations, and best practices. We support both Mysql and MongoDB, and are adding Redis support in 2020. Engineers on the team are not in traditional “DBA” roles--the perfect engineer for our team has a passion for storage in all forms, a zeal for efficient and robust tooling (build or buy), and the ability to communicate and work cross-team to accomplish engineering-wide initiatives. Our Engineering Visibility team is staffed in Europe and runs the logging, metrics, and monitoring solutions for Qualtrics--this role will help provide support for that team and a US-based point of contact.
Job Responsibilities:
- You will build systems to measure reliability of services and actively discover trends needing attention, including capacity planning
- You will fine tune services to reduce latency, conduct operational readiness reviews and automate continuous delivery of software changes
- You will enhance team runbooks and wikis to make everyone better
- Automation of vulnerability patching
- Building and maintaining tooling
- Improving data access policies and practices
- Application client driver upgrades, for example, with MySQL
- Investigating / recommending HA solutions for MySQL and Redis
- Data storage upgrade procedures--developing runbooks and teaching others
- Participation in on-call rotation including incident response and support
- Disaster Recovery strategies and tooling
- Understanding and support of the alerting, metrics, and logging systems
- Fedramp datacenter support
Qualifications:
- Bachelor's degree in CS preferred, or in a hard science or Information Systems
- 2+ years of software development or operations experience
- Experience with high-availability systems
- Excellent leadership, verbal, and written communication skills
- Demonstrated skill and passion for operational excellence
- US Citizenship or Green Card holder for FedRAMP requirements
Preferred Qualifications:
- Experience with AWS technologies, or other “devops” technologies including Docker, Jenkins, Puppet, Vault, orchestration frameworks such as K8s or Nomad
- Experience with shell scripts and/or other scripting languages like Python
- Experience with Unix/Linux platforms
- Experience with NOSQL technologies such as MongoDB, Cassandra, Redis etc
- Experience with Mysql and/or RDBMS data modeling and performance tuning
- Proficiency solving problems and identifying the root cause of issues
- Experience running and maintaining highly available distributed systems
- Capability to retain composure and communicate effectively during operational incidents
- Ability to understand large systems, drilling down to code level
- Ability to communicate effectively to different levels of technical and non-technical audiences