Senior Site Reliability Manager

Website Zapier

Connect apps and automate workflows

Overview

We are seeking a seasoned SRE to join our team!   Working within an engineering team, you’ll improve application reliability by using a software engineering approach to operations. You’ll develop internal tools and systems for all engineering teams to use. Using site reliability principles and a robust approach to observability, you will not only fix problems but solve the issues that contributed to them when things go wrong.

This position works closely with Release Engineering and other engineering teams in our System’s Zone to develop and maintain the tools and systems that support all of Zapier engineering. This role calls upon a broad range of experience and technologies. You’ll get to interact with every engineering team in the organization. Maintaining excellent relationships and communicating effectively with those teams regularly is key to success.

Zapier is rapidly scaling and growing, and you will work directly on the applications that support over 5 million customers. When bad things happen, you’ll have the support of your team to solve contributing causes, to learn from failures, and to build a robust and resilient system for our customers.

Building new features and services is a big part of this role. We are continually developing and implementing new ways to support our teams, understanding our customers needs, and becoming experts in site reliability.

Requirements

We’re looking for an experienced engineer who is eager to use software development approaches to operations. You should have a breadth of experience in software development, operations, and be actively practicing site reliability principles. There is a lot to learn, and we’re continually improving our approaches to SRE. There are plenty of learning opportunities. We don’t expect you to know it all.

Ideally, you’ll have several years of experience in practicing infrastructure as code, including using tools like Ansible, Terraform, and using platforms like Kubernetes. Well-honed experience with the fundamentals of software development goes a long way here. Python and Go, we do it all. Generalists thrive in this role.

Writing is our primary means of communication, from pull requests, team chat, knowledge sharing, and communicating changes. Excellent writing skills are crucial to success here at Zapier. We are 100% remote and commonly work asynchronously. We even wrote a book on it.

You should feel comfortable taking a default to action. Most decisions are changeable. It’s better to deliver something real today over something maybe better later. Sharing context, goals, objectives, and in-progress work in public helps us all achieve a common goal.

Responsibilities

  • Develop new methods for retaining task history
  • Migrating applications and services from EC2 to Kubernetes
  • Write custom Kubernetes controllers to improve resilience
  • Create deployment pipelines in ArgoCD
  • Develop autoscaling strategies to handle bursts in workloads
  • Implementing OPA to enforce policies across our Kubernetes Clusters
  • Deploying ProxySQL for pooling connections against MySQL databases

Benefits

  • Competitive salary (we don’t use remote as an excuse to pay less)
  • Great healthcare + dental + vision coverage*
  • Retirement plan with 4% company match*
  • Profit-sharing
  • 2 annual company retreats to awesome places
  • 14 weeks paid leave for new parents of biological or adopted children
  • Pick your own equipment. We’ll set you up with whatever Apple laptop + monitor combo you want plus any software you need.
  • Unlimited vacation policy. Plus we require you to take at least 2 weeks off each year. We see most employees take 4-5 weeks off per year. This isn’t a vague policy where unlimited vacation means no vacation.
  • Work with awesome companies around the world. We partner with great software companies all over the world and you’ll constantly get to interact with people from these great companies

To apply for this job please visit zapier.com.