Site Reliability Engineer Job Description

What is a Site Reliability Engineer?

Site reliability engineers (SREs) incorporate software engineering aspects and apply them to infrastructure and operations problems. They apply software engineering principles to systems administration and serve as bridges between a company’s development and operations. They perform functions and on-call duties and develop the systems and software that bolster site reliability and performance. They build self-service tools for user groups that provide automation and rely on their services, including automatic test result provisioning and statistical visualizations.

SREs strive to create services that reduce the amount of work in progress for all parties, allowing developers to focus on feature development. They collaborate with product developers to ensure designed solutions respond to non-functional requirements including security and maintainability and work with release engineers to confirm that software delivery pipelines are as efficient as possible. SREs need a bachelor’s degree in computer science or related fields.

Site Reliability Engineer Job Description Template

Job Overview

Responsibilities for Site Reliability Engineer

  • Activities include designing, developing, installing, and maintaining software solutions.
  • Work with engineering teams to refine deployment and release processes.
  • Collaborate with the engineering team on projects as the expert on reliability, performance, and efficiency.
  • Manage on-call rotations across continents, using a follow-the-sun model.
  • Deliver quality managed services in a consistent, timely manner.
  • Assist product engineers in development and deployment of backend applications.
  • Be prepared to explain your work, decisions, and ideas to your colleagues.
  • Participate in 24x7 operational support and on-call rotation shifts.
  • Ensure that all system design and procedures are documented and up-to-date.
  • Provide training and education to engineering as a whole on infrastructure and internal tooling.
  • Provide level of audit and control to security personnel.
  • Monitor and stress test systems to collect metrics for tuning and capacity planning.
  • Work to automate detection and resolution of recurring issues.
  • Build the whole stack from load balancers to the databases, and then move and launch sites on every application release.
  • Ensure safety, predictability, repeatability and auditability of all build and deploy processes.
  • Provide technical leadership for Rightpoint Digital Operations Support Infrastructure team.
  • Develop, coach, mentor individuals and teams and ensure high performance in a fast-paced environment.
  • Build tools and automation that eliminate repetitive tasks and prevent incident occurrence.

Qualifications for Site Reliability Engineer

  • Bachelor's or Graduate's Degree in computer engineering, computer science, engineering or information systems management, or equivalent experience.
  • Experience with Cloud, Linux, JAVA, Python, C, UNIX, and Ruby software and systems.
  • Experience with Agile, SaaS, NoSQL, Cloud Architecture, and Javascript software and systems.
  • Comfortable scripting and debugging.
  • Natural collaboration skills and an eye on continuous improvement.
  • Fluent in scalability and root cause analysis exercises.
  • Dedicated to continuous integration and orchestration.

Ready to Hire a Site Reliability Engineer?

Try Job Postings

Employers: How to Write Great Job Descriptions

  • Be sure to mention requisite years of experience and educational requirements
  • Tell job seekers what's unique about your company and job
  • Ideal length is a few paragraphs or about 200 words
  • Make sure to use appropriate paragraph breaks and bullet points so it's easy on the eyes
Download Full Guide