Site Reliability Engineer at Algorithmia
- Empower large enterprise to run AI and ML at scale, leveraging the best in modern distributed systems and automation technology
- Join a truly remote-friendly company - work anywhere in the US or Canada including your sofa, the beach, or our Seattle waterfront office
- Experience rapid growth in the first AI startup to be funded by Google
Algorithmia automates, optimizes, and accelerates every step of the journey to deploying of AI and ML at scale. We allow anyone to run models on massively parallel infrastructure in minutes instead of months. In our cloud or your datacenter - all completely managed for maximum performance at minimum cost. Already trusted by over 100k developers and major enterprise customers, Algorithmia makes scalable Machine Learning fast, simple, and cost-effective for everyone.
Undergoing enormous customer growth, we’re rapidly scaling our Infrastructure team to meet demand. We’re looking for talented SREs to join a passionate, distributed team and have massive impact by enabling Algorithmia engineering to scale it’s product development and delivery capabilities. This pivotal role intersects a wide breadth of technologies in the DevSecOps space offering an unparalleled opportunity to learn, grow, and impact the most important financial institutions, intelligence agencies, and private companies in the country.
As an SRE on the Infrastructure team at Algorithmia, you will:
- Design, build, and maintain the infrastructure, services, and processes that enable Algorithmia to deliver higher quality software at a higher velocity
- Work cross-team to ensure Algoritmia services are always designed and built with reliability, durability, availability, and sustainability in mind.
- Automate your role out of existence - then do something even more amazing
- Handle the highest-tier of engineering support for AI and ML leaders
- Have a real career plan, with mentorship and fast-track opportunities to promotion, technical leadership, people management, or wherever your interests may be
And we might make the perfect match if you:
- Want to work with modern cloud technologies and complex distributed systems
- Are fluent in managing and driving best practices for containerized distributed systems (Docker and Kubernetes)
- Have experience operating multiple data solutions that power production systems (MySQL, Redis, Etcd, ElasticSearch, RabbitMQ)
- Feel comfortable working across a breadth of wide technologies including languages (Java, Scala, Go, Python, Bash, etc.), configuration management tools (Ansible, Terraform), monitoring tools (Prometheus, Grafana), and cloud providers (AWS, Azure, GCP, VMWare, OpenStack, etc.)
- Geek out on topics like chaos engineering, and think proactively about reliability, high availability, and disaster recovery
- Are passionate about automation, and believe nothing should ever be done manually twice, especially when it comes to provisioning infrastructure
- Enjoy working behind the scenes to deliver solutions that enable engineering organizations to scale by empowering engineers
- Bonus points for a love of data science, any kind of AI or ML experience, interesting public code, or the implementation of something cool on our AI marketplace (hint: free trial!)
As an SRE at Algorithmia you’ll join a passionate team that’s changing the way everyone uses AI and ML. You’ll solve real problems, make an impact, and work in a flexible environment that encourages you to follow your own interests as well. You’ll be welcomed into an intelligent, quirky, and diverse group and gain access to fantastic perks beyond just salary, equity, and insurance benefits - all from the comfort of your own sofa (or our dog-friendly office).
Algorithmia is an equal opportunity employer and we value diversity at our core. We will never discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status and encourage everyone to apply.