Site Reliability Engineering

Site Reliability Engineering (SRE) "is a discipline that incorporates aspects of software engineering and applies them to IT operations problems. The main goals are to create ultra-scalable and highly reliable software systems. According to Ben Treynor, founder of Google's Site Reliability Team, SRE is "what happens when a software engineer is tasked with what used to be called operations." "Fundamentally, it's what happens when you ask a software engineer to design an operations function. So SRE is fundamentally doing work that has historically been done by an operations team, but using engineers with software expertise, and banking on the fact that these engineers are inherently both predisposed to, and have the ability to, substitute automation for human labor." - Ben Treynor