👉 Hiring for a remote Sys Admin position?on the 🏆 #1 remote jobs board
Site Reliability Engineer Working With AWS GCP Azure
Site Reliability Engineer Working With AWS GCP Azure
\nAt Snowplow, we are on a mission to empower people to use data to differentiate. We are able to provide technology which enables customers to not only control their data, but allows them to do amazing things with that control.\n\nAs part of that effort, we're changing the way that people do digital analytics by moving companies away from one-size-fits-all vendors, such as Google Analytics and Adobe, to dictate what should be done with their data and enabling them to collect and own their data themselves.\n\nThe opportunity\n\nOur Managed Service offering has grown significantly over the last year, and we now orchestrate and monitor the Snowplow event pipeline across more than 100 customer-owned AWS accounts, with individual accounts processing many billions of events per month.\n\nWe are looking for our second Site Reliability Engineer to help us grow to managing 1,000 and then 10,000 AWS, GCP and Azure accounts. You’ll work closely with our Tech Ops Lead, on all aspects of our proprietary deployment, orchestration and monitoring stack.\n\nThe team and mission:\n\nTechnical Operations at Snowplow is responsible for two distinct domains:\n\n* Snowplow’s internal infrastructure, which powers Snowplow Insights, CI/CD, the Snowplow website, and our support tooling, all running on AWS\n\n* Our customers’ Snowplow-related infrastructure, running in their own AWS account\n\n\n\nWithin both domains, Tech Ops at Snowplow is striving to increase service reliability, fulfil customer requests in a timely fashion, and automate recurring tasks. Task automation is essential as our customer base grows, because our “infrastructure estate” scales linearly with our customer numbers, unlike most software businesses.\n\nOur roadmap includes:\n\n\n* Deploying, orchestrating and monitoring Snowplow on GCP, Azure and on-premise, not just AWS\n\n* “One click” infrastructure deployment and maintenance\n\n* Building self-healing and self-upgrading infrastructure, which learns how to optimize itself for cost, performance and reliability\n\n\n\n\nThis is an enormously ambitious undertaking but also, we hope, a hugely exciting infrastructure automation challenge!\n\nTechnologies:\n\nToday, our in-house stack uses pragmatic technologies including Docker, Ansible, Consul, CloudFormation, bash and Golang to manage our internal and customer infrastructure.\n\nFor our next level of automation, we are now exploring tools such as Terraform, Kubernetes and Vault.\n\nResponsibilities:\n\n* The development of software for the purposes of automating, monitoring and maintaining client-deployed and Snowplow-internal infrastructure and services\n\n* Providing deep technical support to internal and client teams\n\n* Performing planned upgrades and modifications to customer infrastructure\n\n* Handling high-severity internal or customer incidents, ensuring we meet all SLAs\n\n\n\nWithin the software engineering side you will be responsible for the implementation, deployment and stability of your systems and services. You will own software end to end with a high expectation of ownership over anything that is deployed.\n\nWithin the operational side you will join our on-call process for incident resolution, and be in the assignment for the regular client infrastructure work, with a strong mandate to continuing automation.\n\nWhat we are looking for:\n\nThis role will be a great fit for somebody who:\n\n\n* Has deep knowledge of Linux, networking, containers and similar, able to troubleshoot complex problems on individual servers and distributed systems\n\n* Has worked with at least one of: Amazon Web Services, GCP or Azure\n\n* Has been part of an on-call rotation\n\n* Has interacted directly with customers to solve their specific technical issues\n\n* Is comfortable scripting in one or more of: Bash, Python, Ruby or Perl\n\n* Is comfortable programming in one or more of: Java, Scala, Golang or Python\n\n\n\n\nThis role would be a great fit for a software engineer or systems administrator who wants to transition into a full SRE role.\n\nSecurity:\n\nThe integrity of our customers' systems and data underpin everything we do at Snowplow. As part of their probation, candidates will be put through a full background security check.\n\nOut-of-hours work:\n\nAn important part of this role relates to out-of-hours work, particularly around:\n\n* Performing planned upgrades and modifications to customer infrastructure outside of their working hours\n\n* Being on-call to handle high-severity internal or customer incidents, ensuring we meet all SLAs\n\n\n\nThe on-call process for the Tech Ops team is still evolving; we will discuss these requirements with short-listed candidates.\n\nWhat you’ll get in return:\n\n\n* Competitive package based on experience\n\n* 25 days holiday a year plus bank holidays\n\n* The freedom to work wherever suits you best\n\n* Two fantastic company away-weeks a year\n\n* Working alongside a strong and talented team\n\n\n\n\nOffice-specific:\n\n\n* Convenient central Shoreditch location\n\n* Continuous supply of Pact coffee\n\n* Regular mystery events\n\n* MacBook\n\n\n
See more jobs at Snowplow Analytics
# How do you apply? This job post is older than 30 days and the position is probably filled. Try applying to jobs posted recently instead.Apply for this Job
👉 Please reference you found the job on Remote OK, this helps us get more companies to post here!